High molecular weight surface proteins of non-typeable haemophilus

ABSTRACT

High molecular weight surface proteins of non-typeable Haemophilus inflenzae which exhibit immunogenic properties and genes encoding the same are described. Specifically, genes coding for two immunodominant high molecular weight proteins, HMW1 and HMW2, have been cloned, expressed and sequenced, while genes coding for high molecular proteins HMW3 and HMW4 have been cloned, expressed and partially sequenced.

FIELD OF INVENTION

This invention relates to high molecular weight proteins of non-typeablehaemophilus.

BACKGROUND TO THE INVENTION

Non-typeable Haemophilus influenzae are non-encapsulated organisms thatare defined by their lack of reactivity with antisera against known H.influenzae capsular antigens.

These organisms commonly inhabit the upper respiratory tract of humansand are frequently responsible for infections, such as otitis media,sinusiris, conjunctivitis, bronchitis and pneumonia. Since theseorganisms do not have a polysaccharide capsule, they are not controlledby the present Haemophilus influenzae type b (Hib) vaccines, which aredirected towards Hib bacterial capsular polysaccharides. Thenon-typeable strains, however, do produce surface antigens that canelicit bactericidal antibodies. Two of the major outer membraneproteins, P2 and P6, have been identified as targets of human serumbactericidal activity. However, it has been shown that the P2 proteinsequence is variable, in particular in the non-typeable Haemophilusstrains. Thus, a P2-based vaccine would not protect against all strainsof the organism.

There have previously been identified by Barenkamp et al (Pediatr.Infect. Dis. J., 9:333-339, 1990) a group of high-molecular-weight (HMW)proteins that appeared to be major targets of antibodies present inhuman convalescent sera. Examination of a series of middle ear isolatesrevealed the presence of one or two such proteins in most strains.However, prior to the present invention, the structures of theseproteins were unknown as were pure isolates of such proteins.

SUMMARY OF INVENTION

The inventors, in an effort to further characterize the high molecularweight (HMW) Haemophilus proteins, have cloned, expressed and sequencedthe genes coding for two immunodominant HMW proteins (designated HMW1and HMW2) from a prototype non-typeable Haemophilus strain and havecloned, expressed and almost completely sequenced the genes coding fortwo additional immunodominant HMW proteins (designated HMW3 and HMW4)from another non-typeable Haemophilus strain.

in accordance with one aspect of the present invention, therefore, thereis provided an isolated and purified gene coding for a high molecularweight protein of a non-typeable Haemophilus strain, particularly a genecoding for protein HMW1, HMW2, HMW3 or HMW4, as well as any variant orfragment of such protein which retains the immunological ability toprotect against disease caused by a non-typeable Haemophilus strain. Inanother aspect, the invention provides a high molecular weight proteinof non-typeable Haemophilus influenzae which is encoded by these genes.

BRIEF DESCRIPTION OF DRAWINGS

FIG. 1A-1G is a DNA sequence of a gene coding for protein HMW1 (SEQ IDNO: 1);

FIG. 2A & 2B is a derived amino acid sequence of protein HMW1 (SEQ IDNO: 2);

FIG. 3A-3G is a DNA sequence of a gene coding for protein HMW2 (SEQ IDNO: 3);

FIG. 4A & 4B is a derived amino acid sequence of HMW2 (SEQ ID NO: 4);

FIG. 5A shows restriction maps of representative recombinant phageswhich contained the HMW1 or HMW2 structural genes, the locations of thestructural genes being indicated by the shaded bars;

FIG. 5B shows the restriction map of the T7 expression vector pT7-7;

FIG. 6A-6L contains the DNA sequence of a gene cluster for the hmw1 gene(SEQ ID NO: 5), comprising nucleotides 351 to 4958 (ORF a) (as in FIG.1), as well as two additional downstream genes in the 3' flankingregion, comprising ORFs b, nucleotides 5114-6748 and c nucleotides7062-9011;

FIG. 7A-7L contains the DNA sequence of a gene cluster for the hmw2 gene(SEQ ID NO: 6), comprising nucleotides 792 to 5222 (ORF a) (as in FIG.3), as well as two additional downstream genes in the 3' flankingregion, comprising ORFs b, nucleotides 5375-7009, and c, nucleotides7249-9198;

FIG. 8A-8F is a partial DNA sequence of a gene coding for protein HMW3(SEQ ID NO: 7);

FIG. 9A -9F is a partial DNA sequence of a gene coding for protein HMW4(SEQ ID NO: 8); and

FIG. 10A-10L is a comparison table for the derived amino acid sequencefor proteins HMW1, HMW2, HMW3 and HMW4.

GENERAL DESCRIPTION OF INVENTION

The DNA sequences of the genes coding for HMW1 and HMW2, shown in FIGS.1 and 3 respectively, were shown to be about 80% identical, with thefirst 1259 base pairs of the genes being identical. The derived aminoacid sequences of the two HMW proteins, shown in FIGS. 2 and 4respectively, are about 70% identical. Furthermore, the encoded proteinsare antigenically related to the filamentous hemagglutinin surfaceprotein of Bordetella pertussis. A monoclonal antibody prepared againstfilamentous hemagglutinin (FHA) of Bordetella pertussis was found torecognize both of the high molecular weight proteins. This data suggeststhat the HMW and FHA proteins may serve similar biological functions.The derived amino acid sequences of the HMW1 and HMW2 proteins showsequence similarity to that for the FHA protein. It has further beenshown that these antigenically-related proteins are produced by themajority of the non-typeable strains of Haemophilus. Angisera raisedagainst the protein expressed by the HMW1 gene recognizes both the HMW2protein and the B. pertussis FHA. The present invention includes anisolated and purified high molecular weight protein of non-typeablehaemophilus which is antigenically related to the B. pertussis FHA,which may be obtained from natural sources or produced recombinantly.

A phage genomic library of a known strain of non-typeable Haemophiluswas prepared by standard methods and the library was screened for clonesexpressing high molecular weight proteins, using a high titre antiserumagainst HMW's. A number of strongly reactive DNA clones wereplaque-purified and sub-cloned into a T7 expression plasmid. It wasfound that they all expressed either one or the other of the twohigh-molecular-weight proteins designated HMW1 and HMW2, with apparentmolecular weights of 125 and 120 kDa, respectively, encoded by openreading frames of 4.6 kb and 4.4 kb, respectively.

Representative clones expressing either HMW1 or HMW2 were furthercharacterized and the genes isolated, purified and sequenced. The DNAsequence of HMW1 is shown in FIG. 1A-1C and the corresponding derivedamino acid sequence in FIG. 2A-2B. Similarly, the DNA sequence of HMW2is shown in FIG. 3A-3G and the corresponding derived amino acid sequencein FIG. 4A & 4B. Partial purification of the isolated proteins andN-terminal sequence analysis indicated that the expressed proteins aretruncated since their sequence starts at residue number 442 of both fulllength HMW1 and HMW2 gene products.

Subcloning studies with respect to the hmw1 and hmw2 genes indicatedthat correct processing of the HMW proteins required the products ofadditional downstream genes. It has been found that both the hmw1 andhmw2 genes are flanked by two additional downstream open reading frames(ORFs), designated b and c, respectively, (see FIGS. 6A-6L and 7A-7L).

The b ORFs are 1635 bp in length, extending from nucleotides 5114 to6748 in the case of hmw1 and nucleotides 5375 to 7009 in the case ofhmw2, with their derived amino acid sequences 99% identical. The derivedamino acid sequences demonstrate similarity with the derived amino acidsequences of two genes which encode proteins required for secretion andactivation of hemolysins of P. mirabilis and S. marcescens.

The c ORFs are 1950 bp in length, extending from nucleotides 7062 to9011 in the case of hmw1 and nucleotides 7249 to 9198 in the case ofhmw2, with their derived amino acid sequences 96% identical. The hmw1 cORF is preceded by a series of 9 bp direct tandem repeats. In plasmidsubclones, interruption of the hmw1 b or c ORF results in defectiveprocessing and secretion of the hmw1 structural gene product.

The two high molecular weight proteins have been isolated and purifiedand shown to be partially protective against otitis media in chinchillasand to function as adhesins. These results indicate the potential foruse of such high molecular proteins and structurally-related proteins ofother non-typeable strains of Haemophilus influenzae as components innon-typeable Haemophilus influenzae vaccines.

Since the proteins provided herein are good cross-reactive antigens andare present in the majority of non-typeable Haemophilus strains, it isevident that these HMW proteins may become integral constituents of auniversal Haemophilus vaccine. Indeed, these proteins may be used notonly as protective antigens against otitis, sinusitis and bronchitiscaused by the non-typeable Haemophilus strains, but also may be used ascarriers for the protective Hib polysaccharides in a conjugate vaccineagainst meningitis. The proteins also may be used as carriers for otherantigens, haptens and polysaccharides from other organisms, so as toinduce immunity to such antigens, haptens and polysaccharides.

The nucleotide sequences encoding two high molecular weight proteins ofa different non-typeable Haemophilus strain (designated HMW3 and HMW4)have been largely elucidated, and are presented in FIGS. 8A-8F and9A-9F. HMW3 has an apparent molecular weight of 125 kDa while HMW4 hasan apparent molecular weight of 123 kDa. These high molecular weightproteins are antigenically related to the HMW1 and HMW2 proteins and toFHA. Sequence analysis of HMW3 is approximately 85% complete and of HMW495% complete, with short stretches at the 5'-ends of each gene remainingto be sequenced.

FIG. 10A-10L contains a multiple sequence comparison of the derivedamino acid sequences for the four high molecular weight proteinsidentified herein. As may be seen from this comparison, stretches ofidentical peptide sequence may be found throughout the length of thecomparison, with HMW3 more closely resembling HMW1 and HMW4 more closelyresembling HMW2. This information is highly suggestive of a considerablesequence homology between high molecular weight proteins from variousnon-typeable Haemophilus strains.

In addition, mutants of non-typeable H. influenzae strains that aredeficient in expression of HMW1 or HMW2 or both have been constructedand examined for their capacity to adhere to cultured human epithelialcells. The hmw1 and hmw2 gene clusters have been expressed in E. coliand have been examined for in vitro adherence. The results of suchexperimentation demonstrate that both HMW1 and HMW2 mediate attachmentand hence are adhesins and that this function is present even in theabsence of other H. influenzae surface structures.

With the isolation and purification of the high molecular weightproteins, the inventors are able to determine the major protectiveepitopes by conventional epitope mapping and synthesize peptidescorresponding to these determinants to be incorporated in fullysynthetic or recombinant vaccines. Accordingly, the invention alsocomprises a synthetic peptide having an amino acid sequencecorresponding to at least one protective epitope of a high molecularweight protein of a non-typeable Haemophilus influenzae. Such peptidesare of varying length that constitute portions of thehigh-molecular-weight proteins, that can be used to induce immunity,either directly or as part of a conjugate, against the relativeorganisms and thus constitute vaccines for protection against thecorresponding diseases.

The present invention also provides any variant or fragment of theproteins that retains the potential immunological ability to protectagainst disease caused by non-typeable Haemophilus strains. The variantsmay be constructed by partial deletions or mutations of the genes andexpression of the resulting modified genes to give the proteinvariations.

EXAMPLES Example 1

Non-typeable H. influenzae strains 5 and 12 were isolated in pureculture from the middle ear fluid of children with acute otitis media.Chromosomal DNA from strain 12, providing genes encoding proteins HMW1and HMW2, was prepared by preparing Sau3A partial restriction digests ofchromosomal DNA and fractionating on sucrose gradients. Fractionscontaining DNA fragments in the 9 to 20 kbp range were pooled and alibrary was prepared by ligation into λEMBL3 arms. Ligation mixtureswere packaged in vitro and plate-amplified in a P2 lysogen of E. coliLE392.

For plasmid subcloning studies, DNA from a representative recombinantphage was subcloned into the T7 expression plasmid pT7-7, containing theT7 RNA polymerase promoter φ10, a ribosome-binding site and thetranslational start site for the T7 gene 10 protein upstream from amultiple cloning site (see FIG. 5B).

DNA sequence analysis was performed by the dideoxy method and bothstrands of the HMW1 gene and a single strand of the HMW2 gene weresequenced.

Western immunoblot analysis was performed to identify the recombinantproteins being produced by reactive phage clones. Phage lysates grown inLE392 cells or plaques picked directly from a lawn of LE392 cells on YTplates were solubilized in gel electrophoresis sample buffer prior toelectrophoresis. Sodium dodecyl sulfate (SDS)-polyacrylamide gelelectrophoresis was performed on 7.5% or 11% polyacrylamide modifiedLaemmli gels. After transfer of the proteins to nitrocellulose sheets,the sheets were probed sequentially with an E. coli-absorbed human serumsample containing high-titer antibody to the high-molecular-weightproteins and then with alkaline phosphatase-conjugated goat anti-humanimmunoglobulin G (IgG) second antibody. Sera from healthy adultscontains high-titer antibody directed against surface-exposedhigh-molecular-weight proteins of non-typeable H. influenzae. One suchserum sample was used as the screening antiserum after having beenextensively absorbed with LE392 cells.

To identify recombinant proteins being produced by E. coli transformedwith recombinant plasmids, the plasmids of interest were used totransform E. coli BL21 (DE3)/pLysS. The transformed strains were grownto an A₆₀₀ of 0.5 in L broth containing 50 μg of ampicillin per ml. IPTGwas then added to 1 mM. One hour later, cells were harvested, and asonicate of the cells was prepared. The protein concentrations of thesamples were determined by the bicinchoninic acid method. Cell sonicatescontaining 100 μg of total protein were solubilized in electrophoresissample buffer, subjected to SDS-polyacrylamide gel electrophoresis, andtransferred to nitrocellulose. The nitrocellulose was then probedsequentially with the E. coli-absorbed adult serum sample and then withalkaline phosphatase-conjugated goat anti-human IgG second antibody.

Western immunoblot analysis also was performed to determine whetherhomologous and heterologous non-typeable H. influenzae strains expressedhigh-molecular-weight proteins antigenically related to the proteinencoded by the cloned HMW1 gene (rHMW1). Cell sonicates of bacterialcells were solubilized in electrophoresis sample buffer, subjected toSDS-polyacrylamide gel electrophoresis, and transferred tonitrocellulose. Nitrocellulose was probed sequentially with polyclonalrabbit rHMW1 antiserum and then with alkaline phosphatase-conjugatedgoat anti-rabbit IgG second antibody.

Finally, Western immunoblot analysis was performed to determine whethernon-typeable Haemophilus strains expressed proteins antigenicallyrelated to the filamentous hemagglutinin protein of Bordetellapertussis. Monoclonal antibody X3C, a murine immunoglobulin G (IgG)antibody which recognizes filamentous hemagglutinin, was used to probecell sonicates by Western blot. An alkaline phosphatase-conjugated goatanti-mouse IgG second antibody was used for detection.

To generate recombinant protein antiserum, E. coli BL21(DE3)/pLysS wastransformed with pHMW1-4, and expression of recombinant protein wasinduced with IPTG, as described above. A cell sonicate of the bacterialcells was prepared and separated into a supernatant and pellet fractionby centrifugation at 10,000×g for 30 min. The recombinant proteinfractionated with the pellet fraction. A rabbit was subcutaneouslyimmunized on biweekly schedule with 1 mg of protein from the pelletfraction, the first dose given with Freund's complete adjuvant andsubsequent doses with Freund's incomplete adjuvant. Following the fourthinjection, the rabbit was bled. Prior to use in the Western blot assay,the antiserum was absorbed extensively with sonicates of the host E.coli strain transformed with cloning vector alone.

To assess the sharing of antigenic determinants between HMW1 andfilamentous hemagglutinin, enzyme-linked immunosorbent assay (ELISA)plates (Costar, Cambridge, Mass.) were coated with 60 μl of a 4-ug/mlsolution of filamentous hemagglutinin in Dulbecco's phosphate-bufferedsaline per well for 2 h at room temperature. Wells were blocked for 1 hwith 1% bovine serum albumin in Dulbecco's phosphate-buffered salineprior to addition of serum dilutions. rHMW1 antiserum was seriallydiluted in 0.1% Brij (Sigma, St. Louis, Mo.) in Dulbecco'sphosphate-buffered saline and incubated for 3 h at room temperature.After being washed, the plates were incubated with peroxidase-conjugatedgoat anti-rabbit lgG antibody (Bio-Rad) for 2 h at room temperature andsubsequently developed with2,2'-azino-bis(3-ethylbenzthiazoline-6-sulfonic acid) (Sigma) at aconcentration of 0.54 in mg/ml in 0.1M sodium citrate buffer, pH 4.2,containing 0.03% H₂ O₂. Absorbances were read on an automated ELISAreader.

Recombinant phage expressing HMW1 or HMW2 were recovered as follows. Thenon-typeable H. influenzae strain 12, genomic library was screened forclones expressing high-molecular-weight proteins with an E.coli-absorbed human serum sample containing a high titer of antibodiesdirected against the high-molecular-weight proteins.

Numerous strongly reactive clones were identified along with more weaklyreactive ones. Twenty strongly reactive clones were plaque-purified andexamined by Western blot for expression of recombinant proteins. Each ofthe strongly reactive clones expressed one of two types ofhigh-molecular-weight proteins, designated HMW1 and HMW2. The majorimmunoreactive protein bands in the HMW1 and HMW2 lysates migrated withapparent molecular masses of 125 and 120 kDa, respectively. In additionto the major bands, each lysate contained minor protein bands of higherapparent molecular weight. Protein bands seen in the HMW2 lysates atmolecular masses of less than 120 kDa were not regularly observed andpresumably represent proteolytic degradation products. Lysates of LE392infected with the λEMBL3 cloning vector alone were non-reactive whenimmunologically screened with the same serum sample. Thus, the observedactivity was not due to cross-reactive E. coli proteins orλEMBL3-encoded proteins. Furthermore, the recombinant proteins were notsimply binding immunoglobulin nonspecifically, since the proteins werenot reactive with the goat anti-human IgG conjugate alone, with normalrabbit sera, or with serum from a number of healthy young infants.

Representative clones expressing either the HMW1 or HMW2 recombinantproteins were characterized further. The restriction maps of the twophage types were different from each other, including the regionsencoding the HMW1 and HMW2 structural genes. FIG. 5A shows restrictionmaps of representative recombinant phage which contained the HMW1 orHMW2 structural genes. The locations of the structural genes areindicated by the shaded bars.

HMW1 plasmid subclones were constructed by using the T7 expressionplasmid T7-7 (FIG. 5A and B). HMW2 plasmid subclones also wereconstructed, and the results with these latter subclones were similar tothose observed with the HMW1 constructs.

The approximate location and direction of transcription of the HMW1structure gene were initially determined by using plasmid pHMW1 (FIG.5A). This plasmid was constructed by inserting the 8.5-kb BamHI-SalIfragment from λHMW1 into BamHI- and SalI-cut pT7-7. E. coli transformedwith pHMW1 expressed an immunoreactive recombinant protein with anapparent molecular mass of 115 kDa, which was strongly inducible withIPTG. This protein was significantly smaller than the 125-kDa majorprotein expressed by the parent phage, indicating that it either wasbeing expressed as a fusion protein or was truncated at the carboxyterminus.

To more precisely localize the 3' end of the structural gene, additionalplasmids were constructed with progressive deletions from the 3' end ofthe pHMW1 construct. Plasmid pHMW1-1 was constructed by digestion ofpHMW1 with PstI, isolation of the resulting 8.8-kb fragment, andreligation. Plasmid pHMW1-2 was constructed by digestion of pHMW1 withHindIII, isolation of the resulting 7.5-kb fragment, and religation. E.coli transformed with either plasmid pHMWi-1 or pHMW1-2 also expressedan immunoreactive recombinant protein with an apparent molecular mass of115 kDa. These results indicated that the 3' end of the structural genewas 5' of the HindIII site.

To more precisely localize the 5' end of the gene, plasmids pHMW1-4 andpHMW1-7 were constructed. Plasmid pHMW1-4 was constructed by cloning the5.1-kb BamHI-HindIII fragment from λHMW1 into a pT7-7-derived plasmidcontaining the upstream 3.8-kb EcoRI-BamHi fragment. E. coli transformedwith pHMW1-4 expressed an immunoreactive protein with an apparentmolecular mass of approximately 160 kDa. Although protein production wasinducible with IPTG, the levels of protein production in thesetransformants were substantially lower than those with the pHMW1-2transformants described above. Plasmid pHMW1-7 was constructed bydigesting pHMW1-4 with NdeI and SpeI. The 9.0-kbp fragment generated bythis double digestion was isolated, blunt ended, and religated. E. colitransformed with pHMW1-7 also expressed an immunoreactive protein withan apparent molecular mass of 160 kDa, a protein identical in size tothat expressed by the pHMW1-4 transformants. The result indicated thatthe initiation codon for the HMW1 structural gene was 3' of the SpeIsite. DNA sequence analysis confirmed this conclusion.

As noted above, the λHMW1 phage clones expressed a major immunoreactiveband of 125 kDa, whereas the HMW1 plasmid clones pHMW1-4 and pHMW1-7,which contained what was believed to be the full-length gene, expressedan immunoreactive protein of approximately 160 kDa. This sizediscrepancy was disconcerting. One possible explanation was that anadditional gene or genes necessary for correct processing of the HMW1gene product were deleted in the process of subcloning. To address thispossibility, plasmid pHMW1-14 was constructed. This construct wasgenerated by digesting pHMW1 with NdeI and MluI and inserting the7.6-kbp NdeI-MluI fragment isolated from pHMW1-4. Such a construct wouldcontain the full-length HMW1 gene as well as the DNA 3' of the HMW1 genewhich was present in the original HMW1 phage. E. coli transformed withthis plasmid expressed major immunoreactive proteins with apparentmolecular masses of 125 and 160 kDa as well as additional degradationproducts. The 125- and 160-kDa bands were identical to the major andminor immunoreactive bands detected in the HMW1 phage lysates.Interestingly, the pHMW1-14 construct also expressed significant amountsof protein in the uninduced condition, a situation not observed with theearlier constructs.

The relationship between the 125- and 160-kDa proteins remains somewhatunclear. Sequence analysis, described below, reveals that the HMW1 genewould be predicted to encode a protein of 159 kDa. It is believed thatthe 160-kDa protein is a precursor form of the mature 125-kDa protein,with the conversion from one protein to the other being dependent on theproducts of the two downstream genes.

Sequence analysis of the HMW1 gene (FIG. 1A-1G) revealed a 4,608-bp openreading frame (ORF), beginning with an ATG codon at nucleotide 351 andending with a TAG stop codon at nucleotide 4959. A putativeribosome-binding site with the sequence AGGAG begins 10 bp upstream ofthe putative initiation codon. Five other in-frame ATG codons arelocated within 250 bp of the beginning of the ORF, but none of these ispreceded by a typical ribosome-binding site. The 5'-flanking region ofthe ORF contains a series of direct tandem repeats, with the 7-bpsequence ATCTTTC repeated 16 times. These tandem repeats stop 100 bp 5'of the putative initiation codon. An 8-bp inverted repeat characteristicof a rho-independent transcriptional terminator is present, beginning atnucleotide 4983, 25 bp 3' of the presumed translational stop. Multipletermination codons are present in all three reading frames both upstreamand downstream of the ORF. The derived amino acid sequence of theprotein encoded by the HMW1 gene (FIG. 2A & 2B) has a molecular weightof 159,000, in good agreement with the apparent molecular weights of theproteins expressed by the HMW1-4 and HMW1-7 transformants. The derivedamino acid sequence of the amino terminus does not demonstrate thecharacteristics of a typical signal sequence. The BamHI site used ingeneration of pHMW1 comprises bp 1743 through 1748 of the nucleotidesequence. The ORF downstream of the BamHI site would be predicted toencode a protein of 111 kDa, in good agreement with the 115 kDaestimated for the apparent molecular mass of the pHMW1-encoded fusionprotein.

The sequence of the HMW2 gene (FIG. 3A-3G) consists of a 4,431-bp ORF,beginning with an ATG codon at nucleotide 352 and ending with a TAG stopcodon at nucleotide 4783. The first 1,259 bp of the ORF of the HMW2 geneare identical to those of the HMW1 gene. Thereafter, the sequences beginto diverge but are 80% identical overall. With the exception of a singlebase addition at nucleotide 93 of the HMW2 sequence, the 5'-flankingregions of the HMW1 and HMW2 genes are identical for 310 bp upstreamfrom the respective initiation codons. Thus, the HMW2 gene is precededby the same set of tandem repeats and the same putative ribosome-bindingsite which lies 5' of the HMW1 gene. A putative transcriptionalterminator identical to that identified 3' of the HMW1 ORF is noted,beginning at nucleotide 4804. The discrepancy in the lengths of the twogenes is principally accounted for by a 186-bp gap in the HMW2 sequence,beginning at nucleotide position 3839. The derived amino acid sequenceof the protein encoded by the HMW2 gene (FIG. 4A & 4B) has a molecularweight of 155,000 and is 71% identical with the derived amino acidsequence of the HMW1 gene.

The derived amino acid sequences of both the HMW1 and HMW2 genes (FIGS.2A & 2B and 4A & 4B) demonstrated sequence similarity with the derivedamino acid sequence of filamentous hemagglutinin of Bordetellapertussis, a surface-associated protein of this organism. The initialand optimized TFASTA scores for the HMW1-filamentous hemagglutininsequence comparison were 87 and 186, respectively, with a word size of2. The z score for the comparison was 45.8. The initial and optimizedTFASTA scores for the HMW2-filamentous hemagglutinin sequence comparisonwere 68 and 196, respectively. The z score for the latter comparison was48.7. The magnitudes of the initial and optimized TFASTA scores and thez scores suggested that a biologically significant relationship existedbetween the HMW1 and HMW2 gene products and filamentous hemagglutinin.When the derived amino acid sequences of HMW1, HMW2, and filamentoushemagglutinin genes were aligned and compared, the similarities weremost notable at the amino-terminal ends of the three sequences. Twelveof the first 22 amino acids in the predicted peptide sequences wereidentical. In additional, the sequences demonstrated a commonfive-amino-acid stretch, Asn-Pro-Asn-Gly-Ile, and several shorterstretches of sequence identity within the first 200 amino acids.

EXAMPLE 2

To further explore the HMW1-filamentous hemagglutinin relationship, theability of antiserum prepared against the HMW1-4 recombinant protein(rHMW1) to recognize purified filamentous hemagglutinin was assessed.The rHMW1 antiserum demonstrated ELISA reactivity with filamentoushemagglutinin in a dose-dependent manner. Preimmune rabbit serum hadminimal reactivity in this assay. The rHMW1 antiserum also was examinedin a Western blot assay and demonstrated weak but positive reactivitywith purified filamentous hemagglutinin in this system also.

To identify the native Haemophilus protein corresponding to the HMW1gene product and to determine the extent to which proteins antigenicallyrelated to the HMW1 cloned gene product were common among othernon-typeable H. influenzae strains, a panel of Haemophilus strains wasscreened by Western blot with the rHMW1 antiserum. The antiserumrecognized both a 125- and a 120-kDa protein band in the homologousstrain 12, the putative mature protein products of the HMW1 and HMW2genes, respectively.

When used to screen heterologous non-typeable H. influenzae strains,rHMW1 antiserum recognized high-molecular-weight proteins in 75% of 125epidemiologically unrelated strains. In general, the antiserum reactedwith one or two protein bands in the 100- to 150-kDa range in each ofthe heterologous strains in a pattern similar but not identical to thatseen in the homologous strain.

Nonoclonal antibody X3C is a murine IgG antibody directed against thefilamentous hemagglutinin protein of B. pertussis. This antibody caninhibit the binding of B. pertussis cells to Chinese hamster ovary cellsand HeLa cells in culture and will inhibit hamagglutination oferythrocytes by purified filamentous hemagglutinin. A Western blot assaywas performed in which this monoclonal antibody was screened against thesame panel of non-typeable H. influenzae strains discussed above.Monoclonal antibody X3C recognized both the high-molecular-weightproteins in non-typeable H. influenzae strain 12 which were recognizedby the recombinant-protein antiserum. In addition, the monoclonalantibody recognized protein bands in a subset of heterologousnon-typeable H. influenzas strains which were identical to thoserecognized by the recombinant-protein antiserum. On occasion, thefilamentous hemagglutinin monoclonal antibody appeared to recognize onlyone of the two bands which had been recognized by therecombinant-protein antiserum. Overall, monoclonal antibody X3Crecognized high-molecular-weight protein bands identical to thoserecognized by the rHMW1 antiserum in approximately 35% of our collectionof non-typeable H. influenzae strains.

EXAMPLE 3

Mutants deficient in expression of HMW1, MW2 or both proteins wereconstructed to examine the role of these proteins in bacterialadherence. The following strategy was employed. pHMW1-14 (see Example 1,FIG. 5A) was digested with BaMHI and then ligated to a kanamycincassette isolated on a 1.3-kb BamH1 fragment from pUC4K. The resultantplasmid (pHMW1-17) was linearized by digestion with XbaI and transformedinto non-typeable H. influenzae strain 12, followed by selection forkanamycin resistant colonies. Southern analysis of a series of thesecolonies demonstrated two populations of transformants, one with aninsertion in the HMW1 structural gene and the other with an insertion inthe HMW2 structural gene. One mutant from each of these classes wasselected for further studies.

Mutants deficient in expression of both proteins were recovered usingthe following protocol. After deletion of the 2.1-kb fragment of DNAbetween two EcoRI sites spanning the 3'-portion of the HMW1 structuralgene in pHMW-15, the kanamycin cassette from pUC4K was inserted as a1.3-kb EcoR1 fragment. The resulting plasmid (pHMW1-16) was linearizedby digestion with XbaI and transformed into strain 12, followed again byselection for kanamycin resistant colonies. Southern analysis of arepresentative sampling of these colonies demonstrated that in seven ofeight cases, insertion into both the HMW1 and HMW2 loci had occurred.One such mutant was selected for further studies.

To confirm the intended phenotypes, the mutant strains were examined byWestern blot analysis with a polyclonal antiserum against recombinantHMW1 protein. The parental strain expressed both the 125-kD HMW1 and the120-kD HMW2 protein. In contrast, the HMW2 mutant failed to express the120-kD protein, and the HMW1 mutant failed to express the 125-kDprotein. The double mutant lacked expression of either protein. On thebasis of whole cell lysates, outer membrane profiles, and colonymorphology, the wild type strain and the mutants were otherwiseidentical with one another. Transmission electron microscopydemonstrated that none of the four strains expressed pili.

The capacity of wild type strain 12 to adhere to Chang epithelial cellswas examined. In such assays, bacteria were inoculated into broth andallowed to grow to a density of ˜2×10⁹ cfu/ml. Approximately 2×10⁷ cfuwere inoculated onto epithelial cell monolayers, and plates were gentlycentrifuged at 165× g for 5 minutes to facilitate contact betweenbacteria and the epithelial surface. After incubation for 30 minutes at37° C. in 5% CO₂, monolayers were rinsed 5 times with PBS to removenonadherent organisms and were treated with trypsin-EDTA (0.05% trypsin,0.5% EDTA) in PBS to release them from the plastic support. Wellcontents were agitated, and dilutions were plated on solid medium toyield the number of adherent bacteria per monolayer. Percent adherencewas calculated by dividing the number of adherent cfu per monolayer bythe number of inoculated cfu.

As depicted in Table 1 below (the Tables appear at the end of thedescriptive text), this strain adhered quite efficiently, with nearly90% of the inoculum binding to the monolayer. Adherence by the mutantexpressing HMW1 but not HMW2 (HMW2⁻) was also quite efficient andcomparable to that by the wild type strain. In contrast, attachment bythe strain expressing HMW2 but deficient in expression of HMW1 (HMW1⁻)was decreased about 15-fold relative to the wild type. Adherence by thedouble mutant (HMW1⁻ /HMW2⁻) was decreased even further, approximately50-fold compared with the wild type and approximately 3-fold comparedwith the HMW1 mutant. Considered together, these results suggest thatboth the HMW1 protein and the, HMW2 protein influence attachment toChang epithelial cells. Interestingly, optimal adherence to this cellline appears to require HMW1 but not HMW2.

EXAMPLE 4

Using the plasmids pHMW1-16 and pHMW1-17 (see Example 3) and following ascheme similar to that employed with strain 12 as described in Example3, three non-typeable Haemophilus strain 5 mutants were isolated,including one with the kanamycin gene inserted into the hmw1-like(designated hmw3) locus, a second with an insertion in the hmw2-like(designated hmw4) locus, and a third with insertions in both loci. Aspredicted, Western immunoblot analysis demonstrated that the mutant withinsertion of the kanamycin cassette into the hmw1-like locus had lostexpression of the HMW3 125-kD protein, while the mutant with insertioninto the hmw2-like locus failed to express the HMW4 123-kD protein. Themutant with a double insertion was unable to express either of the highmolecular weight proteins.

As shown in Table 1 below, wild type strain 5 demonstrated high leveladherence, with almost 80% of the inoculum adhering per monolayer.Adherence by the mutant deficient in expression of the HMW2-like proteinwas also quite high. In contrast, adherence by the mutant unable toexpress the, HMW1-like protein was reduced about 5-fold relative to thewild type, and attachment by the double mutant was diminished evenfurther (approximately 25-fold). Examination of Giemsa-stained samplesconfirmed these observations (not shown). Thus, the results with strain5 corroborate the findings with strain 12 and the HMW1 and HMW2proteins.

EXAMPLE 5

To confirm an adherence function for the HMW1 and HMW2 proteins and toexamine the effect of HMW1 and HMW2 independently of other H. influenzaesurface structures, the hmw1 and the hmw2 gene clusters were introducedinto E. coli DH5α, using plasmids pHMW1-14 and pHMW2-21, respectively.As a control, the cloning vector, pT7-7, was also transformed into E.coli DH5α. Western blot analysis demonstrated that E. coli DH5αcontaining the hmw1 genes expressed a 125 kDa protein, while the samestrain harboring the hmw2 genes expressed a 120-kDa protein. E. coliDH5α containing pT7-7 failed to react with antiserum against recombinantHMW1. Transmission electron microscopy revealed no pili or other surfaceappendages on any of the E. coli strains.

Adherence by the E. coli strains was quantitated and compared withadherence by wild type non-typeable H. influenzae strain 12. As shown inTable 2 below, adherence by E. coli DH5α containing vector alone wasless than 1% of that for strain 12. In contrast, E. coli DH5α harboringthe hmw1 gene cluster demonstrated adherence levels comparable to thosefor strain 12. Adherence by E. coli DH5α containing the hmw2 genes wasapproximately 6-fold lower than attachment by strain 12 but wasincreased 20-fold over adherence by E. coli DH5α with pT7-7 alone. Theseresults indicate that the HMW1 and HMW2 proteins are capable ofindependently mediating attachment to Chang conjunctival cells. Theseresults are consistent with the results with the H. influenzae mutantsreported in Examples 3 and 4, providing further evidence that, withChang epithelial cells, HMW1 is a more efficient adhesin than is HMW2.

Experiments with E. coli HB101 harboring pT7-7, pHMW1-14, or pHMW2-21confirmed the results obtained with the DH5α derivatives (see Table 2).

EXAMPLE 6

HMW1 and HMW2 were isolated and purified from non-typeable H. influenzae(NTHI) strain 12 in the following manner. Non-typeable Haemophilusbacteria from frozen stock culture were streaked onto a chocolate plateand grown overnight at 37° C. in an incubator with 5% CO₂. 50 ml starterculture of brain heart infusion (BHI) broth, supplemented with 10 μg/mleach of hemin and AND was inoculated with growth on chocolate plate. Thestarter culture was grown until the optical density (O.D. -600 nm)reached 0.6 to 0.8 and then the bacteria in the starter culture was usedto inoculate six 500 ml flasks of supplemented BHI using 8 to 10 ml perflask. The bacteria were grown in 500 ml flasks for an additional 5 to 6hours at which time the O.D. was 1.5 or greater. Cultures werecentrifuged at 10,000 rpm for 10 minutes.

Bacterial pellets were resuspended in a total volume of 250 ml of anextraction solution comprising 0.5M NaCl, 0.01M Na₂ EDTA, 0.01M Tris 50μM 1,10-phenanthroline, pH 7.5. The cells were not sonicated orotherwise disrupted. The resuspended cells were allowed to sit on ice at0° C. for 60 minutes. The resuspended cells were centrifuged at 10,000rpm for 10 minutes at 4° C. to remove the majority of intact cells andcellular debris. The supernatant was collected and centrifuged at100,000 xg for 60 minutes at 4° C. The supernatant again was collectedand dialyzed overnight at 4° C. against 0.01M sodium phosphate, pH 6.0.

The sample was centrifuged at 10,000 rpm for 10 minutes at 4° C. toremove insoluble debris precipitated from solution during dialysis. Thesupernatant was applied to a 10 ml CM Sepharose column which has beenpre-equilibrated with 0.01M sodium phosphate, pH 6. Followingapplication to this column, the column was washed with 0.01M sodiumphosphate. Proteins were elevated from the column with a 0-0.5M KClgradient in 0.01M Na phosphate, pH 6 and fractions were collected forgel examination. Coomassie gels of column fractions were carried out toidentify those fractions containing high molecular weight proteins. Thefractions containing high molecular weight proteins were pooled andconcentrated to a 1 to 3 ml volume in preparation for application ofsample to gel filtration column.

A Sepharose CL-4B gel filtration column was equilibrated withphosphate-buffered saline, pH 7.5. The concentrated high molecularweight protein sample was applied to the gel filtration column andcolumn fractions were collected. Coomassie gels were performed on thecolumn fractions to identify those containing high molecular weightproteins. The column fractions containing high molecular weight proteinswere pooled.

The proteins were tested to determine whether they would protect againstexperimental otitis media caused by the homologous strain.

Chinchillas received three monthly subcutaneous injections with 40 μg ofan HMW1-HMW2 protein mixture in Freund's adjuvant. One month after thelast injection, the animals were challenged by intrabullar inoculationwith 300 cfu of NTHI strain 12.

Infection developed in 5 of 5 control animals versus 5 of 10 immunizedanimals. Among infected animals, geometric mean bacterial counts inmiddle ear fluid 7 days post-challenge were 7.4×10⁶ in control animalsverus 1.3×10⁵ in immunized animals.

Serum antibody titres following immunization were comparable inuninfected and infected animals. However, infection in immunized animalswas uniformly associated with the appearance of bacteria down-regulatedin expression of the HMW proteins, suggesting bacterial selection inresponse to immunologic pressure.

Although this data shows that protection following immunization was notcomplete, this data suggests the HMW adhesin proteins are potentiallyimportant protective antigens which may comprise one component of amulti-component NTHI vaccine.

EXAMPLE 7

A number of synthetic peptides were derived from HMW1. Antisera then wasraised to these peptides. The anti-peptide antisera to peptide HMW1-P5was shown to recognize HMW1. Peptide HMW1-P5 covers amino acids 1453 to1481 of HMW1, has the sequence VDEVIEAKRILEKVKDLSDEEREALAKLG (SEQ ID NO:9), and represents bases 1498 to 1576 in FIG. 10A-10L.

This finding demonstrates that the DNA sequence and the derived proteinis being interpreted in the correct reading frame and that peptidesderived from the sequence can be produced which will be immunogenic.

SUMMARY OF DISCLOSURE

In summary of this disclosure, the present invention provides highmolecular weight proteins of non-typeable Haemophilus, genes coding forthe same and vaccines incorporating such proteins. Modifications arepossible within the scope of this invention.

                  TABLE 1                                                         ______________________________________                                        Effect of mutation of high molecular weight                                   proteins on adherence to Chang epithelial cells by                            nontypable H. influenzae.                                                                  ADHERENCE*                                                       Strain         % inoculum relative to wild type.sup.†                  ______________________________________                                        Strain 12 derivatives                                                         wild type      87.7   ± 5.9  100.0                                                                              ± 6.7                                 HMW1- mutant   6.0    ± 0.9  6.8  ± 1.0                                 HMW2- mutant   89.9   ± 10.8 102.5                                                                              ± 12.3                                HMW1-/HMW2- mutant                                                                           2.0    ± 0.3  2.3  ± 0.3                                 Strain 5 derivatives                                                          wild type      78.7   ± 3.2  100.0                                                                              ± 4.1                                 HMW1-like mutant                                                                             15.7   ± 2.6  19.9 ± 3.3                                 HMW2-like mutant                                                                             103.7  ± 14.0 131.7                                                                              ± 17.8                                double mutant  3.5    ± 0.6  4.4  ± 0.8                                 ______________________________________                                         *Numbers represent mean (± standard error of the mean) of measurements     in triplicate or quadruplicate from representative experiments.               .sup.† Adherence values for strain 12 derivatives are relative to      strain 12 wild type; values for strain 5 derivatives are relative to          strain 5 wild type.                                                      

                  TABLE 2                                                         ______________________________________                                        Adherence by E. coli DH5α and HB101 harboring                           hmw1 or hmw2 gene clusters.                                                                   Adherence relative to                                         Strain*         H. influenzae strain 12.sup.†                          ______________________________________                                        DH5α (pT7-7)  0.7    ± 0.02                                          DH5α (pHMW1-14)                                                                             114.2  ± 15.9                                          DH5α (pHMW2-21)                                                                             14.0   ± 3.7                                           HB101 (pT7-7)       1.2    ± 0.5                                           HB101 (pHMW1-14)    93.6   ± 15.8                                          HB101 (pHMW2-21)    3.6    ± 0.9                                           ______________________________________                                         *The plasmid pHMW114 contains the hmw1 gene cluster, while pHMW221            contains the hmw2 gene cluster; pT77 is the cloning vector used in these      constructs.                                                                   .sup.† Numbers represent the mean (± standard error of the mean     of measurements made in triplicate from representative experiments.      

    __________________________________________________________________________    SEQUENCE LISTING                                                              (1) GENERAL INFORMATION:                                                      (iii) NUMBER OF SEQUENCES: 8                                                  (2) INFORMATION FOR SEQ ID NO:1:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 5116 base pairs                                                   (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (xi) SEQUENCE DESCRIPTION: SEQ ID NO:1:                                       ACAGCGTTCTCTTAATACTAGTACAAACCCACAATAAAATATGACAAACAACAATTACAA60                CACCTTTTTTGCAGTCTATATGCAAATATTTTAAAAAATAGTATAAATCCGCCATATAAA120               ATGGTATAATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATC180               TTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTC240               ACATGCCCTGATGAACCGAGGGAAGGGAGGGAGGGGCAAGAATGAAGAGGGAGCTGAACG300               AACGCAAATGATAAAGTAATTTAATTGTTCAACTAACCTTAGGAGAAAATATGAACAAGC360               TATATCGTCTCAAATTCAGCAAACGCCTGAATGCTTTGGTTGCTGTGTCTGAATTGGCAC420               GGGGTTGTGACCATTCCACAGAAAAAGGCAGCGAAAAACCTGCTCGCATGAAAGTGCGTC480               ACTTAGCGTTAAAGCCACTTTCCGCTATGTTACTATCTTTAGGTGTAACATCTATTCCAC540               AATCTGTTTTAGCAAGCGGCTTACAAGGAATGGATGTAGTACACGGCACAGCCACTATGC600               AAGTAGATGGTAATAAAACCATTATCCGCAACAGTGTTGACGATATCATTAATTGGAAAC660               AATTTAACATCGACCAAAATGAAATGGTGCAGTTTTTACAAGAAAACAACAACTCCGCCG720               TATTCAACCGTGTTACATCTAACCAAATCTCCCAATTAAAAGGGATTTTAGATTCTAACG780               GACAAGTCTTTTTAATCAACCCAAATGGTATCACAATAGGTAAAGACGCAATTATTAACA840               CTAATGGCTTTACGGCTTCTACGCTAGACATTTCTAACGAAAACATCAAGGCGCGTAATT900               TCACCTTCGAGCAAACCAAAGATAAAGCGCTCGCTGAAATTGTGAATCACGGTTTAATTA960               CTGTCGGTAAAGACGGCAGTGTAAATCTTATTGGTGGCAAAGTGAAAAACGAGGGTGTGA1020              TTAGCGTAAATGGTGGCAGCATTTCTTTACTCGCAGGGCAAAAAATCACCATCAGCGATA1080              TAATAAACCCAACCATTACTTACAGCATTGCCGCGCCTGAAAATGAAGCGGTCAATCTGG1140              GCGATATTTTTGCCAAAGGCGGTAACATTAATGTCCGTGCTGCCACTATTCGAAACCAAG1200              GTAAACTTTCTGCTGATTCTGTAAGCAAAGATAAAAGCGGCAATATTGTTCTTTCCGCCA1260              AAGAGGGTGAAGCGGAAATTGGCGGTGTAATTTCCGCTCAAAATCAGCAAGCTAAAGGCG1320              GCAAGCTGATGATTACAGGCGATAAAGTCACATTAAAAACAGGTGCAGTTATCGACCTTT1380              CAGGTAAAGAAGGGGGAGAAACTTACCTTGGCGGTGACGAGCGCGGCGAAGGTAAAAAGG1440              GCATTCAATTAGCAAAGAAAACCTCTTTAGAAAAAGGCTCAACCATCAATGTATCAGGCA1500              AAGAAAAAGGCGGACGCGCTATTGTGTGGGGCGATATTGCGTTAATTGACGGCAATATTA1560              ACGCTCAAGGTAGTGGTGATATCGCTAAAACCGGTGGTTTTGTGGAGACGTCGGGGCATG1620              ATTTATTCATCAAAGACAATGCAATTGTTGACGCCAAAGAGTGGTTGTTAGACCCGGATA1680              ATGTATCTATTAATGCAGAAACAGCAGGACGCAGCAATACTTCAGAAGACGATGAATACA1740              CGGGATCCGGGAATAGTGCCAGCACCCCAAAACGAAACAAAGAAAAGACAACATTAACAA1800              ACACAACTCTTGAGAGTATACTAAAAAAAGGTACCTTTGTTAACATCACTGCTAATCAAC1860              GCATCTATGTCAATAGCTCCATTAATTTATCCAATGGCAGCTTAACTCTTTGGAGTGAGG1920              GTCGGAGCGGTGGCGGCGTTGAGATTAACAACGATATTACCACCGGTGATGATACCAGAG1980              GTGCAAACTTAACAATTTACTCAGGCGGCTGGGTTGATGTTCATAAAAATATCTCACTCG2040              GGGCGCAAGGTAACATAAACATTACAGCTAAACAAGATATCGCCTTTGAGAAAGGAAGCA2100              ACCAAGTCATTACAGGTCAAGGGACTATTACCTCAGGCAATCAAAAAGGTTTTAGATTTA2160              ATAATGTCTCTCTAAACGGCACTGGCAGCGGACTGCAATTCACCACTAAAAGAACCAATA2220              AATACGCTATCACAAATAAATTTGAAGGGACTTTAAATATTTCAGGGAAAGTGAACATCT2280              CAATGGTTTTACCTAAAAATGAAAGTGGATATGATAAATTCAAAGGACGCACTTACTGGA2340              ATTTAACCTCCTTAAATGTTTCCGAGAGTGGCGAGTTTAACCTCACTATTGACTCCAGAG2400              GAAGCGATAGTGCAGGCACACTTACCCAGCCTTATAATTTAAACGGTATATCATTCAACA2460              AAGACACTACCTTTAATGTTGAACGAAATGCAAGAGTCAACTTTGACATCAAGGCACCAA2520              TAGGGATAAATAAGTATTCTAGTTTGAATTACGCATCATTTAATGGAAACATTTCAGTTT2580              CGGGAGGGGGGAGTGTTGATTTCACACTTCTCGCCTCATCCTCTAACGTCCAAACCCCCG2640              GTGTAGTTATAAATTCTAAATACTTTAATGTTTCAACAGGGTCAAGTTTAAGATTTAAAA2700              CTTCAGGCTCAACAAAAACTGGCTTCTCAATAGAGAAAGATTTAACTTTAAATGCCACCG2760              GAGGCAACATAACACTTTTGCAAGTTGAAGGCACCGATGGAATGATTGGTAAAGGCATTG2820              TAGCCAAAAAAAACATAACCTTTGAAGGAGGTAACATCACCTTTGGCTCCAGGAAAGCCG2880              TAACAGAAATCGAAGGCAATGTTACTATCAATAACAACGCTAACGTCACTCTTATCGGTT2940              CGGATTTTGACAACCATCAAAAACCTTTAACTATTAAAAAAGATGTCATCATTAATAGCG3000              GCAACCTTACCGCTGGAGGCAATATTGTCAATATAGCCGGAAATCTTACCGTTGAAAGTA3060              ACGCTAATTTCAAAGCTATCACAAATTTCACTTTTAATGTAGGCGGCTTGTTTGACAACA3120              AAGGCAATTCAAATATTTCCATTGCCAAAGGAGGGGCTCGCTTTAAAGACATTGATAATT3180              CCAAGAATTTAAGCATCACCACCAACTCCAGCTCCACTTACCGCACTATTATAAGCGGCA3240              ATATAACCAATAAAAACGGTGATTTAAATATTACGAACGAAGGTAGTGATACTGAAATGC3300              AAATTGGCGGCGATGTCTCGCAAAAAGAAGGTAATCTCACGATTTCTTCTGACAAAATCA3360              ATATTACCAAACAGATAACAATCAAGGCAGGTGTTGATGGGGAGAATTCCGATTCAGACG3420              CGACAAACAATGCCAATCTAACCATTAAAACCAAAGAATTGAAATTAACGCAAGACCTAA3480              ATATTTCAGGTTTCAATAAAGCAGAGATTACAGCTAAAGATGGTAGTGATTTAACTATTG3540              GTAACACCAATAGTGCTGATGGTACTAATGCCAAAAAAGTAACCTTTAACCAGGTTAAAG3600              ATTCAAAAATCTCTGCTGACGGTCACAAGGTGACACTACACAGCAAAGTGGAAACATCCG3660              GTAGTAATAACAACACTGAAGATAGCAGTGACAATAATGCCGGCTTAACTATCGATGCAA3720              AAAATGTAACAGTAAACAACAATATTACTTCTCACAAAGCAGTGAGCATCTCTGCGACAA3780              GTGGAGAAATTACCACTAAAACAGGTACAACCATTAACGCAACCACTGGTAACGTGGAGA3840              TAACCGCTCAAACAGGTAGTATCCTAGGTGGAATTGAGTCCAGCTCTGGCTCTGTAACAC3900              TTACTGCAACCGAGGGCGCTCTTGCTGTAAGCAATATTTCGGGCAACACCGTTACTGTTA3960              CTGCAAATAGCGGTGCATTAACCACTTTGGCAGGCTCTACAATTAAAGGAACCGAGAGTG4020              TAACCACTTCAAGTCAATCAGGCGATATCGGCGGTACGATTTCTGGTGGCACAGTAGAGG4080              TTAAAGCAACCGAAAGTTTAACCACTCAATCCAATTCAAAAATTAAAGCAACAACAGGCG4140              AGGCTAACGTAACAAGTGCAACAGGTACAATTGGTGGTACGATTTCCGGTAATACGGTAA4200              ATGTTACGGCAAACGCTGGCGATTTAACAGTTGGGAATGGCGCAGAAATTAATGCGACAG4260              AAGGAGCTGCAACCTTAACTACATCATCGGGCAAATTAACTACCGAAGCTAGTTCACACA4320              TTACTTCAGCCAAGGGTCAGGTAAATCTTTCAGCTCAGGATGGTAGCGTTGCAGGAAGTA4380              TTAATGCCGCCAATGTGACACTAAATACTACAGGCACTTTAACTACCGTGAAGGGTTCAA4440              ACATTAATGCAACCAGCGGTACCTTGGTTATTAACGCAAAAGACGCTGAGCTAAATGGCG4500              CAGCATTGGGTAACCACACAGTGGTAAATGCAACCAACGCAAATGGCTCCGGCAGCGTAA4560              TCGCGACAACCTCAAGCAGAGTGAACATCACTGGGGATTTAATCACAATAAATGGATTAA4620              ATATCATTTCAAAAAACGGTATAAACACCGTACTGTTAAAAGGCGTTAAAATTGATGTGA4680              AATACATTCAACCGGGTATAGCAAGCGTAGATGAAGTAATTGAAGCGAAACGCATCCTTG4740              AGAAGGTAAAAGATTTATCTGATGAAGAAAGAGAAGCGTTAGCTAAACTTGGAGTAAGTG4800              CTGTACGTTTTATTGAGCCAAATAATACAATTACAGTCGATACACAAAATGAATTTGCAA4860              CCAGACCATTAAGTCGAATAGTGATTTCTGAAGGCAGGGCGTGTTTCTCAAACAGTGATG4920              GCGCGACGGTGTGCGTTAATATCGCTGATAACGGGCGGTAGCGGTCAGTAATTGACAAGG4980              TAGATTTCATCCTGCAATGAAGTCATTTTATTTTCGTATTATTTACTGTGTGGGTTAAAG5040              TTCAGTACGGGCTTTACCCATCTTGTAAAAAATTACGGAGAATACAATAAAGTATTTTTA5100              ACAGGTTATTATTATG5116                                                          (2) INFORMATION FOR SEQ ID NO:2:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 1536 amino acids                                                  (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (xi) SEQUENCE DESCRIPTION: SEQ ID NO:2:                                       MetAsnLysIleTyrArgLeuLysPheSerLysArgLeuAsnAlaLeu                              151015                                                                        ValAlaValSerGluLeuAlaArgGlyCysAspHisSerThrGluLys                              202530                                                                        GlySerGluLysProAlaArgMetLysValArgHisLeuAlaLeuLys                              354045                                                                        ProLeuSerAlaMetLeuLeuSerLeuGlyValThrSerIleProGln                              505560                                                                        SerValLeuAlaSerGlyLeuGlnGlyMetAspValValHisGlyThr                              65707580                                                                      AlaThrMetGlnValAspGlyAsnLysThrIleIleArgAsnSerVal                              859095                                                                        AspAlaIleIleAsnTrpLysGlnPheAsnIleAspGlnAsnGluMet                              100105110                                                                     ValGlnPheLeuGlnGluAsnAsnAsnSerAlaValPheAsnArgVal                              115120125                                                                     ThrSerAsnGlnIleSerGlnLeuLysGlyIleLeuAspSerAsnGly                              130135140                                                                     GlnValPheLeuIleAsnProAsnGlyIleThrIleGlyLysAspAla                              145150155160                                                                  IleIleAsnThrAsnGlyPheThrAlaSerThrLeuAspIleSerAsn                              165170175                                                                     GluAsnIleLysAlaArgAsnPheThrPheGluGlnThrLysAspLys                              180185190                                                                     AlaLeuAlaGluIleValAsnHisGlyLeuIleThrValGlyLysAsp                              195200205                                                                     GlySerValAsnLeuIleGlyGlyLysValLysAsnGluGlyValIle                              210215220                                                                     SerValAsnGlyGlySerIleSerLeuLeuAlaGlyGlnLysIleThr                              225230235240                                                                  IleSerAspIleIleAsnProThrIleThrTyrSerIleAlaAlaPro                              245250255                                                                     GluAsnGluAlaValAsnLeuGlyAspIlePheAlaLysGlyGlyAsn                              260265270                                                                     IleAsnValArgAlaAlaThrIleArgAsnGlnGlyLysLeuSerAla                              275280285                                                                     AspSerValSerLysAspLysSerGlyAsnIleValLeuSerAlaLys                              290295300                                                                     GluGlyGluAlaGluIleGlyGlyValIleSerAlaGlnAsnGlnGln                              305310315320                                                                  AlaLysGlyGlyLysLeuMetIleThrGlyAspLysValThrLeuLys                              325330335                                                                     ThrGlyAlaValIleAspLeuSerGlyLysGluGlyGlyGluThrTyr                              340345350                                                                     LeuGlyGlyAspGluArgGlyGluGlyLysAsnGlyIleGlnLeuAla                              355360365                                                                     LysLysThrSerLeuGluLysGlySerThrIleAsnValSerGlyLys                              370375380                                                                     GluLysGlyGlyArgAlaIleValTrpGlyAspIleAlaLeuIleAsp                              385390395400                                                                  GlyAsnIleAsnAlaGlnGlySerGlyAspIleAlaLysThrGlyGly                              405410415                                                                     PheValGluThrSerGlyHisAspLeuPheIleLysAspAsnAlaIle                              420425430                                                                     ValAspAlaLysGluTrpLeuLeuAspPheAspAsnValSerIleAsn                              435440445                                                                     AlaGluThrAlaGlyArgSerAsnThrSerGluAspAspGluTyrThr                              450455460                                                                     GlySerGlyAsnSerAlaSerThrProLysArgAsnLysGluLysThr                              465470475480                                                                  ThrLeuThrAsnThrThrLeuGluSerIleLeuLysLysGlyThrPhe                              485490495                                                                     ValAsnIleThrAlaAsnGlnArgIleTyrValAsnSerSerIleAsn                              500505510                                                                     LeuSerAsnGlySerLeuThrLeuTrpSerGluGlyArgSerGlyGly                              515520525                                                                     GlyValGluIleAsnAsnAspIleThrThrGlyAspAspThrArgGly                              530535540                                                                     AlaAsnLeuThrIleTyrSerGlyGlyTrpValAspValHisLysAsn                              545550555560                                                                  IleSerLeuGlyAlaGlnGlyAsnIleAsnIleThrAlaLysGlnAsp                              565570575                                                                     IleAlaPheGluLysGlySerAsnGlnValIleThrGlyGlnGlyThr                              580585590                                                                     IleThrSerGlyAsnGlnLysGlyPheArgPheAsnAsnValSerLeu                              595600605                                                                     AsnGlyThrGlySerGlyLeuGlnPheThrThrLysArgThrAsnLys                              610615620                                                                     TyrAlaIleThrAsnLysPheGluGlyThrLeuAsnIleSerGlyLys                              625630635640                                                                  ValAsnIleSerMetValLeuProLysAsnGluSerGlyTyrAspLys                              645650655                                                                     PheLysGlyArgThrTyrTrpAsnLeuThrSerLeuAsnValSerGlu                              660665670                                                                     SerGlyGluPheAsnLeuThrIleAspSerArgGlySerAspSerAla                              675680685                                                                     GlyThrLeuThrGlnProTyrAsnLeuAsnGlyIleSerPheAsnLys                              690695700                                                                     AspThrThrPheAsnValGluArgAsnAlaArgValAsnPheAspIle                              705710715720                                                                  LysAlaProIleGlyIleAsnLysTyrSerSerLeuAsnTyrAlaSer                              725730735                                                                     PheAsnGlyAsnIleSerValSerGlyGlyGlySerValAspPheThr                              740745750                                                                     LeuLeuAlaSerSerSerAsnValGlnThrProGlyValValIleAsn                              755760765                                                                     SerLysTyrPheAsnValSerThrGlySerSerLeuArgPheLysThr                              770775780                                                                     SerGlySerThrLysThrGlyPheSerIleGluLysAspLeuThrLeu                              785790795800                                                                  AsnAlaThrGlyGlyAsnIleThrLeuLeuGlnValGluGlyThrAsp                              805810815                                                                     GlyMetIleGlyLysGlyIleValAlaLysLysAsnIleThrPheGlu                              820825830                                                                     GlyGlyAsnIleThrPheGlySerArgLysAlaValThrGluIleGlu                              835840845                                                                     GlyAsnValThrIleAsnAsnAsnAlaAsnValThrLeuIleGlySer                              850855860                                                                     AspPheAspAsnHisGlnLysProLeuThrIleLysLysAspValIle                              865870875880                                                                  IleAsnSerGlyAsnLeuThrAlaGlyGlyAsnIleValAsnIleAla                              885890895                                                                     GlyAsnLeuThrValGluSerAsnAlaAsnPheLysAlaIleThrAsn                              900905910                                                                     PheThrPheAsnValGlyGlyLeuPheAspAsnLysGlyAsnSerAsn                              915920925                                                                     IleSerIleAlaLysGlyGlyAlaArgPheLysAspIleAspAsnSer                              930935940                                                                     LysAsnLeuSerIleThrThrAsnSerSerSerThrTyrArgThrIle                              945950955960                                                                  IleSerGlyAsnIleThrAsnLysAsnGlyAspLeuAsnIleThrAsn                              965970975                                                                     GluGlySerAspThrGluMetGlnIleGlyGlyAspValSerGlnLys                              980985990                                                                     GluGlyAsnLeuThrIleSerSerAspLysIleAsnIleThrLysGln                              99510001005                                                                   IleThrIleLysAlaGlyValAspGlyGluAsnSerAspSerAspAla                              101010151020                                                                  ThrAsnAsnAlaAsnLeuThrIleLysThrLysGluLeuLysLeuThr                              1025103010351040                                                              GlnAspLeuAsnIleSerGlyPheAsnLysAlaGluIleThrAlaLys                              104510501055                                                                  AspGlySerAspLeuThrIleGlyAsnThrAsnSerAlaAspGlyThr                              106010651070                                                                  AsnAlaLysLysValThrPheAsnGlnValLysAspSerLysIleSer                              107510801085                                                                  AlaAspGlyHisLysValThrLeuHisSerLysValGluThrSerGly                              109010951100                                                                  SerAsnAsnAsnThrGluAspSerSerAspAsnAsnAlaGlyLeuThr                              1105111011151120                                                              IleAspAlaLysAsnValThrValAsnAsnAsnIleThrSerHisLys                              112511301135                                                                  AlaValSerIleSerAlaThrSerGlyGluIleThrThrLysThrGly                              114011451150                                                                  ThrThrIleAsnAlaThrThrGlyAsnValGluIleThrAlaGlnThr                              115511601165                                                                  GlySerIleLeuGlyGlyIleGluSerSerSerGlySerValThrLeu                              117011751180                                                                  ThrAlaThrGluGlyAlaLeuAlaValSerAsnIleSerGlyAsnThr                              1185119011951200                                                              ValThrValThrAlaAsnSerGlyAlaLeuThrThrLeuAlaGlySer                              120512101215                                                                  ThrIleLysGlyThrGluSerValThrThrSerSerGlnSerGlyAsp                              122012251230                                                                  IleGlyGlyThrIleSerGlyGlyThrValGluValLysAlaThrGlu                              123512401245                                                                  SerLeuThrThrGlnSerAsnSerLysIleLysAlaThrThrGlyGlu                              125012551260                                                                  AlaAsnValThrSerAlaThrGlyThrIleGlyGlyThrIleSerGly                              1265127012751280                                                              AsnThrValAsnValThrAlaAsnAlaGlyAspLeuThrValGlyAsn                              128512901295                                                                  GlyAlaGluIleAsnAlaThrGluGlyAlaAlaThrLeuThrThrSer                              130013051310                                                                  SerGlyLysLeuThrThrGluAlaSerSerHisIleThrSerAlaLys                              131513201325                                                                  GlyGlnValAsnLeuSerAlaGlnAspGlySerValAlaGlySerIle                              133013351340                                                                  AsnAlaAlaAsnValThrLeuAsnThrThrGlyThrLeuThrThrVal                              1345135013551360                                                              LysGlySerAsnIleAsnAlaThrSerGlyThrLeuValIleAsnAla                              136513701375                                                                  LysAspAlaGluLeuAsnGlyAlaAlaLeuGlyAsnHisThrValVal                              138013851390                                                                  AsnAlaThrAsnAlaAsnGlySerGlySerValIleAlaThrThrSer                              139514001405                                                                  SerArgValAsnIleThrGlyAspLeuIleThrIleAsnGlyLeuAsn                              141014151420                                                                  IleIleSerLysAsnGlyIleAsnThrValLeuLeuLysGlyValLys                              1425143014351440                                                              IleAspValLysTyrIleGlnProGlyIleAlaSerValAspGluVal                              144514501455                                                                  IleGluAlaLysArgIleLeuGluLysValLysAspLeuSerAspGlu                              146014651470                                                                  GluArgGluAlaLeuAlaLysLeuGlyValSerAlaValArgPheIle                              147514801485                                                                  GluProAsnAsnThrIleThrValAspThrGlnAsnGluPheAlaThr                              149014951500                                                                  ArgProLeuSerArgIleValIleSerGluGlyArgAlaCysPheSer                              1505151015151520                                                              AsnSerAspGlyAlaThrValCysValAsnIleAlaAspAsnGlyArg                              152515301535                                                                  (2) INFORMATION FOR SEQ ID NO:3:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 4937 base pairs                                                   (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (xi) SEQUENCE DESCRIPTION: SEQ ID NO:3:                                       TAAATATACAAGATAATAAAAATAAATCAAGATTTTTGTGATGACAAACAACAATTACAA60                CACCTTTTTTGCAGTCTATATGCAAATATTTTAAAAAAATAGTATAAATCCGCCATATAA120               AATGGTATAATCTTTCATCTTTCATCTTTAATCTTTCATCTTTCATCTTTCATCTTTCAT180               CTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTT240               CACATGAAATGATGAACCGAGGGAAGGGAGGGAGGGGCAAGAATGAAGAGGGAGCTGAAC300               GAACGCAAATGATAAAGTAATTTAATTGTTCAACTAACCTTAGGAGAAAATATGAACAAG360               ATATATCGTCTCAAATTCAGCAAACGCCTGAATGCTTTGGTTGCTGTGTCTGAATTGGCA420               CGGGGTTGTGACCATTCCACAGAAAAAGGCTTCCGCTATGTTACTATCTTTAGGTGTAAC480               CACTTAGCGTTAAAGCCACTTTCCGCTATGTTACTATCTTTAGGTGTAACATCTATTCCA540               CAATCTGTTTTAGCAAGCGGCTTACAAGGAATGGATGTAGTACACGGCACAGCCACTATG600               CAAGTAGATGGTAATAAAACCATTATCCGCAACAGTGTTGACGCTATCATTAATTGGAAA660               CAATTTAACATCGACCAAAATGAAATGGTGCAGTTTTTACAAGAAAACAACAACTCCGCC720               GTATTCAACCGTGTTACATCTAACCAAATCTCCCAATTAAAAGGGATTTTAGATTCTAAC780               GGACAAGTCTTTTTAATCAACCCAAATGGTATCACAATAGGTAAAGACGCAATTATTAAC840               ACTAATGGCTTTACGGCTTCTACGCTAGACATTTCTAACGAAAACATCAAGGCGCGTAAT900               TTCACCTTCGAGCAAACCAAAGATAAAGCGCTCGCTGAAATTGTGAATCACGGTTTAATT960               ACTGTCGGTAAAGACGGCAGTGTAAATCTTATTGGTGGCAAAGTGAAAAACGAGGGTGTG1020              ATTAGCGTAAATGGTGGCAGCATTTCTTTACTCGCAGGGCAAAAAATCACCATCAGCGAT1080              ATAATAAACCCAACCATTACTTACAGCATTGCCGCGCCTGAAAATGAAGCGGTCAATCTG1140              GGCGATATTTTTGCCAAAGGCGGTAACATTAATGTCCGTGCTGCCACTATTCGAAACCAA1200              GGTAAACTTTCTGCTGATTCTGTAAGCAAAGATAAAAGCGGCAATATTGTTCTTTCCGCC1260              AAAGAGGGTGAAGCGGAAATTGGCGGTGTAATTTCCGCTCAAAATCAGCAAGCTAAAGGC1320              GGCAAGCTGATGATTACAGGCGATAAAGTCACATTAAAAACAGGTGCAGTTATCGACCTT1380              TCAGGTAAAGAAGGGGGAGAAACTTACCTTGGCGGTGACGAGCGCGGCGAAGGTAAAAAC1440              GGCATTCAATTAGCAAAGAAAACCTCTTTAGAAAAAGGCTCAACCATCAATGTATCAGGC1500              AAAGAAAAAGGCGGACGCGCTATTGTGTGGGGCGATATTGCGTTAATTGACGGCAATATT1560              AACGCTCAAGGTAGTGGTGATATCGCTAAAACCGGTGGTTTTGTGGAGACATCGGGGCAT1620              TATTTATCCATTGACAGCAATGCAATTGTTAAAACAAAAGAGTGGTTGCTAGACCCTGAT1680              GATGTAACAATTGAAGCCGAAGACCCCCTTCGCAATAATACCGGTATAAATGATGAATTC1740              CCAACAGGCACCGGTGAAGCAAGCGACCCTAAAAAAAATAGCGAACTCAAAACAACGCTA1800              ACCAATACAACTATTTCAAATTATCTGAAAAACGCCTGGACAATGAATATAACGGCATCA1860              AGAAAACTTACCGTTAATAGCTCAATCAACATCGGAAGCAACTCCCACTTAATTCTCCAT1920              AGTAAAGGTCAGCGTGGCGGAGGCGTTCAGATTGATGGAGATATTACTTCTAAAGGCGGA1980              AATTTAACCATTTATTCTGGCGGATGGGTTGATGTTCATAAAAATATTACGCTTGATCAG2040              GGTTTTTTAAATATTACCGCCGCTTCCGTAGCTTTTGAAGGTGGAAATAACAAAGCACGC2100              GACGCGGCAAATGCTAAAATTGTCGCCCAGGGCACTGTAACCATTACAGGAGAGGGAAAA2160              GATTTCAGGGCTAACAACGTATCTTTAAACGGAACGGGTAAAGGTCTGAATATCATTTCA2220              TCAGTGAATAATTTAACCCACAATCTTAGTGGCACAATTAACATATCTGGGAATATAACA2280              ATTAACCAAACTACGAGAAAGAACACCTCGTATTGGCAAACCAGCCATGATTCGCACTGG2340              AACGTCAGTGCTCTTAATCTAGAGACAGGCGCAAATTTTACCTTTATTAAATACATTTCA2400              AGCAATAGCAAAGGCTTAACAACACAGTATAGAAGCTCTGCAGGGGTGAATTTTAACGGC2460              GTAAATGGCAACATGTCATTCAATCTCAAAGAAGGAGCGAAAGTTAATTTCAAATTAAAA2520              CCAAACGAGAACATGAACACAAGCAAACCTTTACCAATTCGGTTTTTAGCCAATATCACA2580              GCCACTGGTGGGGGCTCTGTTTTTTTTGATATATATGCCAACCATTCTGGCAGAGGGGCT2640              GAGTTAAAAATGAGTGAAATTAATATCTCTAACGGCGCTAATTTTACCTTAAATTCCCAT2700              GTTCGCGGCGATGACGCTTTTAAAATCAACAAAGACTTAACCATAAATGCAACCAATTCA2760              AATTTCAGCCTCAGACAGACGAAAGATGATTTTTATGACGGGTACGCACGCAATGCCATC2820              AATTCAACCTACAACATATCCATTCTGGGCGGTAATGTCACCCTTGGTGGACAAAACTCA2880              AGCAGCAGCATTACGGGGAATATTACTATCGAGAAAGCAGCAAATGTTACGCTAGAAGCC2940              AATAACGCCCCTAATCAGCAAAACATAAGGGATAGAGTTATAAAACTTGGCAGCTTGCTC3000              GTTAATGGGAGTTTAAGTTTAACTGGCGAAAATGCAGATATTAAAGGCAATCTCACTATT3060              TCAGAAAGCGCCACTTTTAAAGGAAAGACTAGAGATACCCTAAATATCACCGGCAATTTT3120              ACCAATAATGGCACTGCCGAAATTAATATAACACAAGGAGTGGTAAAACTTGGCAATGTT3180              ACCAATGATGGTGATTTAAACATTACCACTCACGCTAAACGCAACCAAAGAAGCATCATC3240              GGCGGAGATATAATCAACAAAAAAGGAAGCTTAAATATTACAGACAGTAATAATGATGCT3300              GAAATCCAAATTGGCGGCAATATCTCGCAAAAAGAAGGCAACCTCACGATTTCTTCCGAT3360              AAAATTAATATCACCAAACAGATAACAATCAAAAAGGGTATTGATGGAGAGGACTCTAGT3420              TCAGATGCGACAAGTAATGCCAACCTAACTATTAAAACCAAAGAATTGAAATTGACAGAA3480              GACCTAAGTATTTCAGGTTTCAATAAAGCAGAGATTACAGCCAAAGATGGTAGAGATTTA3540              ACTATTGGCAACAGTAATGACGGTAACAGCGGTGCCGAAGCCAAAACAGTAACTTTTAAC3600              AATGTTAAAGATTCAAAAATCTCTGCTGACGGTCACAATGTGACACTAAATAGCAAAGTG3660              AAAACATCTAGCAGCAATGGCGGACGTGAAAGCAATAGCGACAACGATACCGGCTTAACT3720              ATTACTGCAAAAAATGTAGAAGTAAACAAAGATATTACTTCTCTCAAAACAGTAAATATC3780              ACCGCGTCGGAAAAGGTTACCACCACAGCAGGCTCGACCATTAACGCAACAAATGGCAAA3840              GCAAGTATTACAACCAAAACAGGTGATATCAGCGGTACGATTTCCGGTAACACGGTAAGT3900              GTTAGCGCGACTGGTGATTTAACCACTAAATCCGGCTCAAAAATTGAAGCGAAATCGGGT3960              GAGGCTAATGTAACAAGTGCAACAGGTACAATTGGCGGTACAATTTCCGGTAATACGGTA4020              AATGTTACGGCAAACGCTGGCGATTTAACAGTTGGGAATGGCGCAGAAATTAATGCGACA4080              GAAGGAGCTGCAACCTTAACCGCAACAGGGAATACCTTGACTACTGAAGCCGGTTCTAGC4140              ATCACTTCAACTAAGGGTCAGGTAGACCTCTTGGCTCAGAATGGTAGCATCGCAGGAAGC4200              ATTAATGCTGCTAATGTGACATTAAATACTACAGGCACCTTAACCACCGTGGCAGGCTCG4260              GATATTAAAGCAACCAGCGGCACCTTGGTTATTAACGCAAAAGATGCTAAGCTAAATGGT4320              GATGCATCAGGTGATAGTACAGAAGTGAATGCAGTCAACGCAAGCGGCTCTGGTAGTGTG4380              ACTGCGGCAACCTCAAGCAGTGTGAATATCACTGGGGATTTAAACACAGTAAATGGGTTA4440              AATATCATTTCGAAAGATGGTAGAAACACTGTGCGCTTAAGAGGCAAGGAAATTGAGGTG4500              AAATATATCCAGCCAGGTGTAGCAAGTGTAGAAGAAGTAATTGAAGCGAAACGCGTCCTT4560              GAAAAAGTAAAAGATTTATCTGATGAAGAAAGAGAAACATTAGCTAAACTTGGTGTAAGT4620              GCTGTACGTTTTGTTGAGCCAAATAATACAATTACAGTCAATACACAAAATGAATTTACA4680              ACCAGACCGTCAAGTCAAGTGATAATTTCTGAAGGTAAGGCGTGTTTCTCAAGTGGTAAT4740              GGCGCACGAGTATGTACCAATGTTGCTGACGATGGACAGCCGTAGTCAGTAATTGACAAG4800              GTAGATTTCATCCTGCAATGAAGTCATTTTATTTTCGTATTATTTACTGTGTGGGTTAAA4860              GTTCAGTACGGGCTTTACCCATCTTGTAAAAAATTACGGAGAATACAATAAAGTATTTTT4920              AACAGGTTATTATTATG4937                                                         (2) INFORMATION FOR SEQ ID NO:4:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 1477 amino acids                                                  (B) TYPE: amino acid                                                          (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (xi) SEQUENCE DESCRIPTION: SEQ ID NO:4:                                       MetAsnLysIleTyrArgLeuLysPheSerLysArgLeuAsnAlaLeu                              151015                                                                        ValAlaValSerGluLeuAlaArgGlyCysAspHisSerThrGluLys                              202530                                                                        GlySerGluLysProAlaArgMetLysValArgHisLeuAlaLeuLys                              354045                                                                        ProLeuSerAlaMetLeuLeuSerLeuGlyValThrSerIleProGln                              505560                                                                        SerValLeuAlaSerGlyLeuGlnGlyMetAspValValHisGlyThr                              65707580                                                                      AlaThrMetGlnValAspGlyAsnLysThrIleIleArgAsnSerVal                              859095                                                                        AspAlaIleIleAsnTrpLysGlnPheAsnIleAspGlnAsnGluMet                              100105110                                                                     ValGlnPheLeuGlnGluAsnAsnAsnSerAlaValPheAsnArgVal                              115120125                                                                     ThrSerAsnGlnIleSerGlnLeuLysGlyIleLeuAspSerAsnGly                              130135140                                                                     GlnValPheLeuIleAsnProAsnGlyIleThrIleGlyLysAspAla                              145150155160                                                                  IleIleAsnThrAsnGlyPheThrAlaSerThrLeuAspIleSerAsn                              165170175                                                                     GluAsnIleLysAlaArgAsnPheThrPheGluGlnThrLysAspLys                              180185190                                                                     AlaLeuAlaGluIleValAsnHisGlyLeuIleThrValGlyLysAsp                              195200205                                                                     GlySerValAsnLeuIleGlyGlyLysValLysAsnGluGlyValIle                              210215220                                                                     SerValAsnGlyGlySerIleSerLeuLeuAlaGlyGlnLysIleThr                              225230235240                                                                  IleSerAspIleIleAsnProThrIleThrTyrSerIleAlaAlaPro                              245250255                                                                     GluAsnGluAlaValAsnLeuGlyAspIlePheAlaLysGlyGlyAsn                              260265270                                                                     IleAsnValArgAlaAlaThrIleArgAsnGlnGlyLysLeuSerAla                              275280285                                                                     AspSerValSerLysAspLysSerGlyAsnIleValLeuSerAlaLys                              290295300                                                                     GluGlyGluAlaGluIleGlyGlyValIleSerAlaGlnAsnGlnGln                              305310315320                                                                  AlaLysGlyGlyLysLeuMetIleThrGlyAspLysValThrLeuLys                              325330335                                                                     ThrGlyAlaValIleAspLeuSerGlyLysGluGlyGlyGluThrTyr                              340345350                                                                     LeuGlyGlyAspGluArgGlyGluGlyLysAsnGlyIleGlnLeuAla                              355360365                                                                     LysLysThrSerLeuGluLysGlySerThrIleAsnValSerGlyLys                              370375380                                                                     GluLysGlyGlyPheAlaIleValTrpGlyAspIleAlaLeuIleAsp                              385390395400                                                                  GlyAsnIleAsnAlaGlnGlySerGlyAspIleAlaLysThrGlyGly                              405410415                                                                     PheValGluThrSerGlyHisAspLeuPheIleLysAspAsnAlaIle                              420425430                                                                     ValAspAlaLysGluTrpLeuLeuAspPheAspAsnValSerIleAsn                              435440445                                                                     AlaGluAspProLeuPheAsnAsnThrGlyIleAsnAspGluPhePro                              450455460                                                                     ThrGlyThrGlyGluAlaSerAspProLysLysAsnSerGluLeuLys                              465470475480                                                                  ThrThrLeuThrAsnThrThrIleSerAsnTyrLeuLysAsnAlaTrp                              485490495                                                                     ThrMetAsnIleThrAlaSerArgLysLeuThrValAsnSerSerIle                              500505510                                                                     AsnIleGlySerAsnSerHisLeuIleLeuHisSerLysGlyGlnArg                              515520525                                                                     GlyGlyGlyValGlnIleAspGlyAspIleThrSerLysGlyGlyAsn                              530535540                                                                     LeuThrIleTyrSerGlyGlyTrpValAspValHisLysAsnIleThr                              545550555560                                                                  LeuAspGlnGlyPheLeuAsnIleThrAlaAlaSerValAlaPheGlu                              565570575                                                                     GlyGlyAsnAsnLysAlaArgAspAlaAlaAsnAlaLysIleValAla                              580585590                                                                     GlnGlyThrValThrIleThrGlyGluGlyLysAspPheArgAlaAsn                              595600605                                                                     AsnValSerLeuAsnGlyThrGlyLysGlyLeuAsnIleIleSerSer                              610615620                                                                     ValAsnAsnLeuThrHisAsnLeuSerGlyThrIleAsnIleSerGly                              625630635640                                                                  AsnIleThrIleAsnGlnThrThrArgLysAsnThrSerTyrTrpGln                              645650655                                                                     ThrSerHisAspSerHisTrpAsnValSerAlaLeuAsnLeuGluThr                              660665670                                                                     GlyAlaAsnPheThrPheIleLysTyrIleSerSerAsnSerLysGly                              675680685                                                                     LeuThrThrGlnTyrArgSerSerAlaGlyValAsnPheAsnGlyVal                              690695700                                                                     AsnGlyAsnMetSerPheAsnLeuLysGluGlyAlaLysValAsnPhe                              705710715720                                                                  LysLeuLysProAsnGluAsnMetAsnThrSerLysProLeuProIle                              725730735                                                                     ArgPheLeuAlaAsnIleThrAlaThrGlyGlyGlySerValPhePhe                              740745750                                                                     AspIleTyrAlaAsnHisSerGlyArgGlyAlaGluLeuLysMetSer                              755760765                                                                     GluIleAsnIleSerAsnGlyAlaAsnPheThrLeuAsnSerHisVal                              770775780                                                                     ArgGlyAspAspAlaPheLysIleAsnLysAspLeuThrIleAsnAla                              785790795800                                                                  ThrAsnSerAsnPheSerLeuArgGlnThrLysAspAspPheTyrAsp                              805810815                                                                     GlyTyrAlaArgAsnAlaIleAsnSerThrTyrAsnIleSerIleLeu                              820825830                                                                     GlyGlyAsnValThrLeuGlyGlyGlnAsnSerSerSerSerIleThr                              835840845                                                                     GlyAsnIleThrIleGluLysAlaAlaAsnValThrLeuGluAlaAsn                              850855860                                                                     AsnAlaProAsnGlnGlnAsnIleArgAspArgValIleLysLeuGly                              865870875880                                                                  SerLeuLeuValAsnGlySerLeuSerLeuThrGlyGluAsnAlaAsp                              885890895                                                                     IleLysGlyAsnLeuThrIleSerGluSerAlaThrPheLysGlyLys                              900905910                                                                     ThrArgAspThrLeuAsnIleThrGlyAsnPheThrAsnAsnGlyThr                              915920925                                                                     AlaGluIleAsnIleThrGlnGlyValValLysLeuGlyAsnValThr                              930935940                                                                     AsnAspGlyAspLeuAsnIleThrThrHisAlaLysArgAsnGlnArg                              945950955960                                                                  SerIleIleGlyGlyAspIleIleAsnLysLysGlySerLeuAsnIle                              965970975                                                                     ThrAspSerAsnAsnAspAlaGluIleGlnIleGlyGlyAsnIleSer                              980985990                                                                     GlnLysGluGlyAsnLeuThrIleSerSerAspLysIleAsnIleThr                              99510001005                                                                   LysGlnIleThrIleLysLysGlyIleAspGlyGluAspSerSerSer                              101010151020                                                                  AspAlaThrSerAsnAlaAsnLeuThrIleLysThrLysGluLeuLys                              1025103010351040                                                              LeuThrGluAspLeuSerIleSerGlyPheAsnLysAlaGluIleThr                              104510501055                                                                  AlaLysAspGlyArgAspLeuThrIleGlyAsnSerAsnAspGlyAsn                              106010651070                                                                  SerGlyAlaGluAlaLysThrValThrPheAsnAsnValLysAspSer                              107510801085                                                                  LysIleSerAlaAspGlyHisAsnValThrLeuAsnSerLysValLys                              109010951100                                                                  ThrSerSerSerAsnGlyGlyArgGluSerAsnSerAspAsnAspThr                              1105111011151120                                                              GlyLeuThrIleThrAlaLysAsnValGluValAsnLysAspIleThr                              112511301135                                                                  SerLeuLysThrValAsnIleThrAlaSerGluLysValThrThrThr                              114011451150                                                                  AlaGlySerThrIleAsnAlaThrAsnGlyLysAlaSerIleThrThr                              115511601165                                                                  LysThrGlyAspIleSerGlyThrIleSerGlyAsnThrValSerVal                              117011751180                                                                  SerAlaThrValAspLeuThrThrLysSerGlySerLysIleGluAla                              1185119011951200                                                              LysSerGlyGluAlaAsnValThrSerAlaThrGlyThrIleGlyGly                              120512101215                                                                  ThrIleSerGlyAsnThrValAsnValThrAlaAsnAlaGlyAspLeu                              122012251230                                                                  ThrValGlyAsnGlyAlaGluIleAsnAlaThrGluGlyAlaAlaThr                              123512401245                                                                  LeuThrAlaThrGlyAsnThrLeuThrThrGluAlaGlySerSerIle                              125012551260                                                                  ThrSerThrLysGlyGlnValAspLeuLeuAlaGlnAsnGlySerIle                              1265127012751280                                                              AlaGlySerIleAsnAlaAlaAsnValThrLeuAsnThrThrGlyThr                              128512901295                                                                  LeuThrThrValAlaGlySerAspIleLysAlaThrSerGlyThrLeu                              130013051310                                                                  ValIleAsnAlaLysAspAlaLysLeuAsnGlyAspAlaSerGlyAsp                              131513201325                                                                  SerThrGluValAsnAlaValAsnAlaSerGlySerGlySerValThr                              133013351340                                                                  AlaAlaThrSerSerSerValAsnIleThrGlyAspLeuAsnThrVal                              1345135013551360                                                              AsnGlyLeuAsnIleIleSerLysAspGlyArgAsnThrValArgLeu                              136513701375                                                                  ArgGlyLysGluIleGluValLysTyrIleGlnProGlyValAlaSer                              138013851390                                                                  ValGluGluValIleGluAlaLysArgValLeuGluLysValLysAsp                              139514001405                                                                  LeuSerAspGluGluArgGluThrLeuAlaLysLeuGlyValSerAla                              141014151420                                                                  ValArgPheValGluProAsnAsnThrIleThrValAsnThrGlnAsn                              1425143014351440                                                              GluPheThrThrArgProSerSerGlnValIleIleSerGluGlyLys                              144514501455                                                                  AlaCysPheSerSerGlyAsnGlyAlaArgValCysThrAsnValAla                              146014651470                                                                  AspAspGlyGlnPro                                                               1475                                                                          (2) INFORMATION FOR SEQ ID NO:5:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 9171 base pairs                                                   (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (xi) SEQUENCE DESCRIPTION: SEQ ID NO:5:                                       ACAGCGTTCTCTTAATACTAGTACAAACCCACAATAAAATATGACAAACAACAATTACAA60                CACCTTTTTTGCAGTCTATATGCAAATATTTTAAAAAATAGTATAAATCCGCCATATAAA120               ATGGTATAATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATC180               TTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTC240               ACATGAAATGATGAACCGAGGGAAGGGAGGGAGGGGCAAGAATGAAGAGGGAGCTGAACG300               AACGCAAATGATAAAGTAATTTAATTGTTCAACTAACCTTAGGAGAAAATATGAACAAGA360               TATATCGTCTCAAATTCAGCAAACGCCTGAATGCTTTGGTTGCTGTGTCTGAATTGGCAC420               GGGGTTGTGACCATTCCACAGAAAAAGGCAGCGAAAAACCTGCTCGCATGAAAGTGCGTC480               ACTTAGCGTTAAAGCCACTTTCCGCTATGTTACTATCTTTAGGTGTAACATCTATTCCAC540               AATCTGTTTTAGCAAGCGGCTTACAAGGAATGGATGTAGTACACGGCACAGCCACTATGC600               AAGTAGATGGTAATAAAACCATTATCCGCAACAGTGTTGACGCTATCATTAATTGGAAAC660               AATTTAACATCGACCAAAATGAAATGGTGCAGTTTTTACAAGAAAACAACAACTCCGCCG720               TATTCAACCGTGTTACATCTAACCAAATCTCCCAATTAAAAGGGATTTTAGATTCTAACG780               GACAAGTCTTTTTAATCAACCCAAATGGTATCACAATAGGTAAAGACGCAATTATTAACA840               CTAATGGCTTTACGGCTTCTACGCTAGACATTTCTAACGAAAACATCAAGGCGCGTAATT900               TCACCTTCGAGCAAACCAAAGATAAAGCGCTCGCTGAAATTGTGAATCACGGTTTAATTA960               CTGTCGGTAAAGACGGCAGTGTAAATCTTATTGGTGGCAAAGTGAAAAACGAGGGTGTGA1020              TTAGCGTAAATGGTGGCAGCATTTCTTTACTCGCAGGGCAAAAAATCACCATCAGCGATA1080              TAATAAACCCAACCATTACTTACAGCATTGCCGCGCCTGAAAATGAAGCGGTCAATCTGG1140              GCGATATTTTTGCCAAAGGCGGTAACATTAATGTCCGTGCTGCCACTATTCGAAACCAAG1200              CTTTCCGCCAAAGAGGGTGAAGCGGAAATTGGCGGTGTAATTTCCGCTCAAAATCAGCAA1260              GCTAAAGGCGGCAAGCTGATGATTACAGGCGATAAAGTCACATTAAAAACAGGTGCAGTT1320              ATCGACCTTTCAGGTAAAGAAGGGGGAGAAACTTACCTTGGCGGTGACGAGCGCGGCGAA1380              GGTAAAAACGGCATTCAATTAGCAAAGAAAACCTCTTTAGAAAAAGGCTCAACCATCAAT1440              GTATCAGGCAAAGAAAAAGGCGGACGCGCTATTGTGTGGGGCGATATTGCGTTAATTGAC1500              GGCAATATTAACGCTCAAGGTAGTGGTGATATCGCTAAAACCGGTGGTTTTGTGGAGACG1560              TCGGGGCATGATTTATTCATCAAAGACAATGCAATTGTTGACGCCAAAGAGTGGTTGTTA1620              GACCCGGATAATGTATCTATTAATGCAGAAACAGCAGGACGCAGCAATACTTCAGAAGAC1680              GATGAATACACGGGATCCGGGAATAGTGCCAGCACCCCAAAACGAAACAAAGAAAAGACA1740              ACATTAACAAACACAACTCTTGAGAGTATACTAAAAAAAGGTACCTTTGTTAACATCACT1800              GCTAATCAACGCATCTATGTCAATAGCTCCATTAATTTATCCAATGGCAGCTTAACTCTT1860              TGGAGTGAGGGTCGGAGCGGTGGCGGCGTTGAGATTAACAACGATATTACCACCGGTGAT1920              GATACCAGAGGTGCAAACTTAACAATTTACTCAGGCGGCTGGGTTGATGTTCATAAAAAT1980              ATCTCACTCGGGGCGCAAGGTAACATAAACATTACAGCTAAACAAGATATCGCCTTTGAG2040              AAAGGAAGCAACCAAGTCATTACAGGTCAAGGGACTATTACCTCAGGCAATCAAAAAGGT2100              TTTAGATTTAATAATGTCTCTCTAAACGGCACTGGCAGCGGACTGCAATTCACCACTAAA2160              AGAACCAATAAATACGCTATCACAAATAAATTTGAAGGGACTTTAAATATTTCAGGGAAA2220              GTGAACATCTCAATGGTTTTACCTAAAAATGAAAGTGGATATGATAAATTCAAAGGACGC2280              ACTTACTGGAATTTAACCTCGAAAGTGGATATGATAAATTCAAAGGACGCCCTCACTATT2340              GACTCCAGAGGAAGCGATAGTGCAGGCACACTTACCCAGCCTTATAATTTAAACGGTATA2400              TCATTCAACAAAGACACTACCTTTAATGTTGAACGAAATGCAAGAGTCAACTTTGACATC2460              AAGGCACCAATAGGGATAAATAAGTATTCTAGTTTGAATTACGCATCATTTAATGGAAAC2520              ATTTCAGTTTCGGGAGGGGGGAGTGTTGATTTCACACTTCTCGCCTCATCCTCTAACGTC2580              CAAACCCCCGGTGTAGTTATAAATTCTAAATACTTTAATGTTTCAACAGGGTCAAGTTTA2640              AGATTTAAAACTTCAGGCTCAACAAAAACTGGCTTCTCAATAGAGAAAGATTTAACTTTA2700              AATGCCACCGGAGGCAACATAACACTTTTGCAAGTTGAAGGCACCGATGGAATGATTGGT2760              AAAGGCATTGTAGCCAAAAAAAACATAACCTTTGAAGGAGGTAAGATGAGGTTTGGCTCC2820              AGGAAAGCCGTAACAGAAATCGAAGGCAATGTTACTATCAATAACAACGCTAACGTCACT2880              CTTATCGGTTCGGATTTTGACAACCATCAAAAACCTTTAACTATTAAAAAAGATGTCATC2940              ATTAATAGCGGCAACCTTACCGCTGGAGGCAATATTGTCAATATAGCCGGAAATCTTACC3000              GTTGAAAGTAACGCTAATTTCAAAGCTATCACAAATTTCACTTTTAATGTAGGCGGCTTG3060              TTTGACAACAAAGGCAATTCAAATATTTCCATTGCCAAAGGAGGGGCTCGCTTTAAAGAC3120              ATTGATAATTCCAAGAATTTAAGCATCACCACCAACTCCAGCTCCACTTACCGCACTATT3180              ATAAGCGGCAATATAACCAATAAAAACGGTGATTTAAATATTACGAACGAAGGTAGTGAT3240              ACTGAAATGCAAATTGGCGGCGATGTCTCGCAAAAAGAAGGTAATCTCACGATTTCTTCT3300              GACAAAATCAATATTACCAAACAGATAACAATCAAGGCAGGTGTTGATGGGGAGAATTCC3360              GATTCAGACGCGACAAACAATGCCAATCTAACCATTAAAACCAAAGAATTGAAATTAACG3420              CAAGACCTAAATATTTCAGGTTTCAATAAAGCAGAGATTACAGCTAAAGATGGTAGTGAT3480              TTAACTATTGGTAACACCAATAGTGCTGATGGTACTAATGCCAAAAAAGTAACCTTTAAC3540              CAGGTTAAAGATTCAAAAATCTCTGCTGACGGTCACAAGGTGACACTACACAGCAAAGTG3600              GAAACATCCGGTAGTAATAACAACACTGAAGATAGCAGTGACAATAATGCCGGCTTAACT3660              ATCGATGCAAAAAATGTAACAGTAAACAACAATATTACTTCTCACAAAGCAGTGAGCATC3720              TCTGCGACAAGTGGAGAAATTACCACTAAAACAGGTACAACCATTAACGCAACCACTGGT3780              AACGTGGAGATAACCGCTCAAACAGGTAGTATCCTAGGTGGAATTGAGTCCAGCTCTGGC3840              TCTGTAACACTTACTGCAACCGAGGGCGCTCTTGCTGTAAGCAATATTTCGGGCAACACC3900              GTTACTGTTACTGCAAATAGCGGTGCATTAACCACTTTGGCAGGCTCTACAATTAAAGGA3960              ACCGAGAGTGTAACCACTTCAAGTCAATCAGGCGATATCGGCGGTACGATTTCTGGTGGC4020              ACAGTAGAGGTTAAAGCAACCGAAAGTTTAACCACTCAATCCAATTCAAAAATTAAAGCA4080              ACAACAGGCGAGGCTAACGTAACAAGTGCAACAGGTACAATTGGTGGTACGATTTCCGGT4140              AATACGGTAAATGTTACGGCAAACGCTGGCGATTTAACAGTTGGGAATGGCGCAGAAATT4200              AATGCGACAGAAGGAGCTGCAACCTTAACTACATCATCGGGCAAATTAACTACCGAAGCT4260              AGTTCACACATTACTTCAGCCAAGGGTCAGGTAAATCTTTCAGCTCAGGATGGTAGCGTT4320              GCAGGAAGTATTAATGCCGCCAATGTGACACTAAATACTACAGGCACTTTAACTACCGTG4380              AAGGGTTCAAACATTAATGCAACCAGCGGTACCTTGGTTATTAACGCAAAAGACGCTGAG4440              CTAAATGGCGCAGCATTGGGTAACCACACAGTGGTAAATGCAACCAACGCAAATGGCTCC4500              GGCAGCGTAATCGCGACAACCTCAAGCAGAGTGAACATCACTGGGGATTTAATCACAATA4560              AATGGATTAAATATCATTTCAAAAAACGGTATAAACACCGTACTGTTAAAAGGCGTTAAA4620              ATTGATGTGAAATACATTCAACCGGGTATAGCAAGCGTAGATGAAGTAATTGAAGCGAAA4680              CGCATCCTTGAGAAGGTAAAAGATTTATCTGATGAAGAAAGAGAAGCGTTAGCTAAACTT4740              GGCGTAAGTGCTGTACGTTTTATTGAGCCAAATAATACAATTACAGTCGATACACAAAAT4800              GAATTTGCAACCAGACCATTAAGTCGAATAGTGATTTCTGAAGGCAGGGCGTGTTTCTCA4860              AACAGTGATGGCGCGACGGTGTGCGTTAATATCGCTGATAACGGGCGGTAGCGGTCAGTA4920              ATTGACAAGGTAGATTTCATCCTGCAATGAAGTCATTTTATTTTCGTATTATTTACTGTG4980              TGGGTTAAAGTTCAGTACGGGCTTTACCCATCTTGTAAAAAATTACGGAGAATACAATAA5040              AGTATTTTTAACAGGTTATTATTATGAAAAATATAAAAAGCAGATTAAAACTCAGTGCAA5100              TATCAGTATTGCTTGGCCTGGCTTCTTCATCATTGTATGCAGAAGAAGCGTTTTTAGTAA5160              AAGGCTTTCAGTTATCTGGTGCACTTGAAACTTTAAGTGAAGACGCCCAACTGTCTGTAG5220              CAAAATCTTTATCTAAATACCAAGGCTCGCAAACTTTAACAAACCTAAAAACAGCACAGC5280              TTGAATTACAGGCTGTGCTAGATAAGATTGAGCCAAATAAGTTTGATGTGATATTGCCAC5340              AACAAACCATTACGGATGGCAATATTATGTTTGAGCTAGTCTCGAAATCAGCCGCAGAAA5400              GCCAAGTTTTTTATAAGGCGAGCCAGGGTTATAGTGAAGAAAATATCGCTCGTAGCCTGC5460              CATCTTTGAAACAAGGAAAAGTGTATGAAGATGGTCGTCAGTGGTTCGATTTGCGTGAAT5520              TCAATATGGCAAAAGAAAATCCACTTAAAGTCACTCGCGTGCATTACGAGTTAAACCCTA5580              AAAACAAAACCTCTGATTTGGTAGTTGCAGGTTTTTCGCCTTTTGGCAAAACGCGTAGCT5640              TTGTTTCCTATGATAATTTCGGCGCAAGGGAGTTTAACTATCAACGTGTAAGTCTAGGTT5700              TTGTAAATGCCAATTTGACCGGACATGATGATGTATTAAATCTAAACGCATTGACCAATG5760              TAAAAGCACCATCAAAATCTTATGCGGTAGGCATAGGATATACTTATCCGTTTTATGATA5820              AACACCAATCCTTAAGTCTTTATACCAGCATGAGTTATGCTGATTCTAATGATATCGACG5880              GCTTACCAAGTGCGATTAATCGTAAATTATCAAAAGGTCAATCTATCTCTGCGAATCTGA5940              AATGGAGTTATTATCTCCCGACATTTAACCTTGGAATGGAAGACCAGTTTAAAATTAATT6000              TAGGCTACAACTACCGCCATATTAATCAAACATCCGAGTTAAACACCCTGGGTGCAACGA6060              AGAAAAAATTTGCAGTATCAGGCGTAAGTGCAGGCATTGATGGACATATCCAATTTACCC6120              CTAAAACAATCTTTAATATTGATTTAACTCATCATTATTACGCGAGTAAATTACCAGGCT6180              CTTTTGGAATGGAGCGCATTGGCGAAACATTTAATCGCAGCTATCACATTAGCACAGCCA6240              GTTTAGGGTTGAGTCAAGAGTTTGCTCAAGGTTGGCATTTTAGCAGTCAATTATCGGGTC6300              AGTTTACTCTACAAGATATAAGTAGCATAGATTTATTCTCTGTAACAGGTACTTATGGCG6360              TCAGAGGCTTTAAATACGGCGGTGCAAGTGGTGAGCGCGGTCTTGTATGGCGTAATGAAT6420              TAAGTATGCCAAAATACACCCGCTTTCAAATCAGCCCTTATGCGTTTTATGATGCAGGTC6480              AGTTCCGTTATAATAGCGAAAATGCTAAAACTTACGGCGAAGATATGCACACGGTATCCT6540              CTGCGGGTTTAGGCATTAAAACCTCTCCTACACAAAACTTAAGCTTAGATGCTTTTGTTG6600              CTCGTCGCTTTGCAAATGCCAATAGTGACAATTTGAATGGCAACAAAAAACGCACAAGCT6660              CACCTACAACCTTCTGGGGTAGATTAACATTCAGTTTCTAACCCTGAAATTTAATCAACT6720              GGTAAGCGTTCCGCCTACCAGTTTATAACTATATGCTTTACCCGCCAATTTACAGTCTAT6780              ACGCAACCCTGTTTTCATCCTTATATATCAAACAAACTAAGCAAACCAAGCAAACCAAGC6840              AAACCAAGCAAACCAAGCAAACCAAGCAAACCAAGCAAACCAAGCAAACCAAGCAAACCA6900              AGCAAACCAAGCAAACCAAGCAAACCAAGCAAACCAAGCAATGCTAAAAAACAATTTATA6960              TGATAAACTAAAACATACTCCATACCATGGCAATACAAGGGATTTAATAATATGACAAAA7020              GAAAATTTACAAAGTGTTCCACAAAATACGACCGCTTCACTTGTAGAATCAAACAACGAC7080              CAAACTTCCCTGCAAATACTTAAACAACCACCCAAACCCAACCTATTACGCCTGGAACAA7140              CATGTCGCCAAAAAAGATTATGAGCTTGCTTGCCGCGAATTAATGGCGATTTTGGAAAAA7200              ATGGACGCTAATTTTGGAGGCGTTCACGATATTGAATTTGACGCACCTGCTCAGCTGGCA7260              TATCTACCCGAAAAACTACTAATTCATTTTGCCACTCGTCTCGCTAATGCAATTACAACA7320              CTCTTTTCCGACCCCGAATTGGCAATTTCCGAAGAAGGGGCATTAAAGATGATTAGCCTG7380              CAACGCTGGTTGACGCTGATTTTTGCCTCTTCCCCCTACGTTAACGCAGACCATATTCTC7440              AATAAATATAATATCAACCCAGATTCCGAAGGTGGCTTTCATTTAGCAACAGACAACTCT7500              TCTATTGCTAAATTCTGTATTTTTTACTTACCCGAATCCAATGTCAATATGAGTTTAGAT7560              GCGTTATGGGCAGGGAATCAACAACTTTGTGCTTCATTGTGTTTTGCGTTGCAGTCTTCA7620              CGTTTTATTGGTACTGCATCTGCGTTTCATAAAAGAGCGGTGGTTTTACAGTGGTTTCCT7680              AAAAAACTCGCCGAAATTGCTAATTTAGATGAATTGCCTGCAAATATCCTTCATGATGTA7740              TATATGCACTGCAGTTATGATTTAGCAAAAAACAAGCACGATGTTAAGCGTCCATTAAAC7800              GAACTTGTCCGCAAGCATATCCTCACGCAAGGATGGCAAGACCGCTACCTTTACACCTTA7860              GGTAAAAAGGACGGCAAACCTGTGATGATGGTACTGCTTGAACATTTTAATTCGGGACAT7920              TCGATTTATCGCACGCATTCAACTTCAATGATTGCTGCTCGAGAAAAATTCTATTTAGTC7980              GGCTTAGGCCATGAGGGCGTTGATAACATAGGTCGAGAAGTGTTTGACGAGTTCTTTGAA8040              ATCAGTAGCAATAATATAATGGAGAGACTGTTTTTTATCCGTAAACAGTGCGAAACTTTC8100              CAACCCGCAGTGTTCTATATGCCAAGCATTGGCATGGATATTACCACGATTTTTGTGAGC8160              AACACTCGGCTTGCCCCTATTCAAGCTGTAGCCTTGGGTCATCCTGCCACTACGCATTCT8220              GAATTTATTGATTATGTCATCGTAGAAGATGATTATGTGGGCAGTGAAGATTGTTTTAGC8280              GAAACCCTTTTACGCTTACCCAAAGATGCCCTACCTTATGTACCATCTGCACTCGCCCCA8340              CAAAAAGTGGATTATGTACTCAGGGAAAACCCTGAAGTAGTCAATATCGGTATTGCCGCT8400              ACCACAATGAAATTAAACCCTGAATTTTTGCTAACATTGCAAGAAATCAGAGATAAAGCT8460              AAAGTCAAAATACATTTTCATTTCGCACTTGGACAATCAACAGGCTTGACACACCCTTAT8520              GTCAAATGGTTTATCGAAAGCTATTTAGGTGACGATGCCACTGCACATCCCCACGCACCT8580              TATCACGATTATCTGGCAATATTGCGTGATTGCGATATGCTACTAAATCCGTTTCCTTTC8640              GGTAATACTAACGGCATAATTGATATGGTTACATTAGGTTTAGTTGGTGTATGCAAAACG8700              GGGGATGAAGTACATGAACATATTGATGAAGGTCTGTTTAAACGCTTAGGACTACCAGAA8760              TGGCTGATAGCCGACACACGAGAAACATATATTGAATGTGCTTTGCGTCTAGCAGAAAAC8820              CATCAAGAACGCCTTGAACTCCGTCGTTACATCATAGAAAACAACGGCTTACAAAAGCTT8880              TTTACAGGCGACCCTCGTCCATTGGGCAAAATACTGCTTAAGAAAACAAATGAATGGAAG8940              CGGAAGCACTTGAGTAAAAAATAACGGTTTTTTAAAGTAAAAGTGCGGTTAATTTTCAAA9000              GCGTTTTAAAAACCTCTCAAAAATCAACCGCACTTTTATCTTTATAACGCTCCCGCGCGC9060              TGACAGTTTATCTCTTTCTTAAAATACCCATAAAATTGTGGCAATAGTTGGGTAATCAAA9120              TTCAATTGTTGATACGGCAAACTAAAGACGGCGCGTTCTTCGGCAGTCATC9171                       (2) INFORMATION FOR SEQ ID NO:6:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 9323 base pairs                                                   (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (xi) SEQUENCE DESCRIPTION: SEQ ID NO:6:                                       CGCCACTTCAATTTTGGATTGTTGAAATTCAACTAACCAAAAAGTGCGGTTAAAATCTGT60                GGAGAAAATAGGTTGTAGTGAAGAACGAGGTAATTGTTCAAAAGGATAAAGCTCTCTTAA120               TTGGGCATTGGTTGGCGTTTCTTTTTCGGTTAATAGTAAATTATATTCTGGACGACTATG180               CAATCCACCAACAACTTTACCGTTGGTTTTAAGCGTTAATGTAAGTTCTTGCTCTTCTTG240               GCGAATACGTAATCCCATTTTTTGTTTAGCAAGAAAATGATCGGGATAATCATAATAGGT300               GTTGCCCAAAAATAAATTTTGATGTTCTAAAATCATAAATTTTGCAAGATATTGTGGCAA360               TTCAATACCTATTTGTGGCGAAATCGCCAATTTTAATTCAATTTCTTGTAGCATAATATT420               TCCCACTCAAATCAACTGGTTAAATATACAAGATAATAAAAATAAATCAAGATTTTTGTG480               ATGACAAACAACAATTACAACACCTTTTTTGCAGTCTATATGCAAATATTTTAAAAAAAT540               AGTATAAATCCGCCATATAAAATGGTATAATCTTTCATCTTTCATCTTTCATCTTTCATC600               TTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTCATCTTTC660               ATCTTTCATCTTTCATCTTTCACATGAAATGATGAACCGAGGGAAGGGAGGGAGGGGCAA720               GAATGAAGAGGGAGCTGAACGAACGCAAATGATAAAGTAATTTAATTGTTCAACTAACCT780               TAGGAGAAAATATGAACAAGATATATCGTCTCAAATTCAGCAAACGCCTGAATGCTTTGG840               TTGCTGTGTCTGAATTGGCACGGGGTTGTGACCATTCCACAGAAAAAGGCAGCGAAAAAC900               CTGCTCGCATGAAAGTGCGTCACTTAGCGTTAAAGCCACTTTCCGCTATGTTACTATCTT960               TAGGTGTAACATCTATTCCACAATCTGTTTTAGCAAGCGGCAATTTAACATCGACCAAAA1020              TGAAATGGTGCAGTTTTTACAAGAAAACAAGTAATAAAACCATTATCCGCAACAGTGTTG1080              ACGCTATCATTAATTGGAAACAATTTAACATCGACCAAAATGAAATGGTGCAGTTTTTAC1140              AAGAAAACAACAACTCCGCCGTATTCAACCGTGTTACATCTAACCAAATCTCCCAATTAA1200              AAGGGATTTTAGATTCTAACGGACAAGTCTTTTTAATCAACCCAAATGGTATCACAATAG1260              GTAAAGACGCAATTATTAACACTAATGGCTTTACGGCTTCTACGCTAGACATTTCTAACG1320              AAAACATCAAGGCGCGTAATTTCACCTTCGAGCAAACCAAAGATAAAGCGCTCGCTGAAA1380              TTGTGAATCACGGTTTAATTACTGTCGGTAAAGACGGCAGTGTAAATCTTATTGGTGGCA1440              AAGTGAAAAACGAGGGTGTGATTAGCGTAAATGGTGGCAGCATTTCTTTACTCGCAGGGC1500              AAAAAATCACCATCAGCGATATAATAAACCCAACCATTACTTACAGCATTGCCGCGCCTG1560              AAAATGAAGCGGTCAATCTGGGCGATATTTTTGCCAAAGGCGGTAACATTAATGTCCGTG1620              CTGCCACTATTCGAAACCAAGGTAAACTTTCTGCTGATTCTGTAAGCAAAGATAAAAGCG1680              GCAATATTGTTCTTTCCGCCAAAGAGGGTGAAGCGGAAATTGGCGGTGTAATTTCCGCTC1740              AAAATCAGCAAGCTAAAGGCGGCAAGCTGATGATAAAGTCCGATAAAGTCACATTAAAAA1800              CAGGTGCAGTTATCGACCTTTCAGGTAAAGAAGGGGGAGAAACTTACCTTGGCGGTGACG1860              AGCGCGGCGAAGGTAAAAACGGCATTCAATTAGCAAAGAAAACCTCTTTAGAAAAAGGCT1920              CAACCATCAATGTATCAGGCAAAGAAAAAGGCGGACGCGCTATTGTGTGGGGCGATATTG1980              CGTTAATTGACGGCAATATTAACGCTCAAGGTAGTGGTGATATCGCTAAAACCGGTGGTT2040              TTGTGGAGACATCGGGGCATTATTTATCCATTGACAGCAATGCAATTGTTAAAACAAAAG2100              AGTGGTTGCTAGACCCTGATGATGTAACAATTGAAGCCGAAGACCCCCTTCGCAATAATA2160              CCGGTATAAATGATGAATTCCCAACAGGCACCGGTGAAGCAAGCGACCCTAAAAAAAATA2220              GCGAACTCAAAACAACGCTAACCAATACAACTATTTCAAATTATCTGAAAAACGCCTGGA2280              CAATGAATATAACGGCATCAAGAAAACTTACCGTTAATAGCTCAATCAACATCGGAAGCA2340              ACTCCCACTTAATTCTCCATAGTAAAGGTCAGCGTGGCGGAGGCGTTCAGATTGATGGAG2400              ATATTACTTCTAAAGGCGGAAATTTAACCATTTATTCTGGCGGATGGGTTGATGTTCATA2460              AAAATATTACGCTTGATCAGGGTTTTTTAAATATTACCGCCGCTTCCGTAGCTTTTGAAG2520              GTGGAAATAACAAAGCACGCGACGCGGCAAATGCTAAAATTGTCGCCCAGGGCACTGTAA2580              CCATTACAGGAGAGGGAAAAGATTTCAGGGCTAACAACGTATCTTTAAACGGAACGGGTA2640              AAGGTCTGAATATCATTTCATCAGTGAATAATTTAACCCACAATCTTAGTGGCACAATTA2700              ACATATCTGGGAATATAACAATTAACCAAACTACGAGAAAGAACACCTCGTATTGGCAAA2760              CCAGCCATGATTCGCACTGGAACGTCAGTGCTCTTAATCTAGAGACAGGCGCAAATTTTA2820              CCTTTATTAAATACATTTCAAGCAATAGCAAAGGCTTAACAACACAGTATAGAAGCTCTG2880              CAGGGGTGAATTTTAACGGCGTAAATGGCAACATGTCATTCAATCTCAAAGAAGGAGCGA2940              AAGTTAATTTCAAATTAAAACCAAACGAGAACATGAACACAAGCAAACCTTTACCAATTC3000              GGTTTTTAGCCAATATCACAGCCACTGGTGGGGGCTCTGTTTTTTTTGATATATATGCCA3060              ACCATTCTGGCAGAGGGGCTGAGTTAAAAATGAGTGAAATTAATATCTCTAACGGCGCTA3120              ATTTTACCTTAAATTCCCATGTTCGCGGCGATGACGCTTTTAAAATCAACAAAGACTTAA3180              CCATAAATGCAACCAATTCAAATTTCAGCCTCAGACAGACGAAAGATGATTTTTATGACG3240              GGTACGCACGCAATGCCATCAATTCAACCTACAACATATCCATTCTGGGCGGTAATGTCA3300              CCCTTGGTGGACAAAACTCAAGCAGCAGCATTACGGGGAATATTACTATCGAGAAAGCAG3360              CAAATGTTACGCTAGAAGCCAATAACGCCCCTAATCAGCAAAACATAAGGGATAGAGTTA3420              TAAAACTTGGCAGCTTGCTCGTTAATGGGAGTTTAAGTTTAACTGGCGAAAATGCAGATA3480              TTAAAGGCAATCTCACTATTTCAGAAAGCGCCACTTTTAAAGGAAAGACTAGAGATACCC3540              TAAATATCACCGGCAATTTTACCAATAATGGCACTGCCGAAATTAATATAACACAAGGAG3600              TGGTAAAACTTGGCAATGTTACCAATGATGGTGATTTAAACATTACCACTCACGCTAAAC3660              GCAACCAAAGAAGCATCATCGGCGGAGATATAATCAACAAAAAAGGAAGCTTAAATATTA3720              CAGACAGTAATAATGATGCTGAAATCCAAATTGGCGGCAATATCTCGCAAAAAGAAGGCA3780              ACCTCACGATTTCTTCCGATAAAATTAATATCACCAAACAGATAACAATCAAAAAGGGTA3840              TTGATGGAGAGGACTCTAGTTCAGATGCGACAAGTAATGCCAACCTAACTATTAAAACCA3900              AAGAATTGAAATTGACAGAAGACCTAAGTATTTCAGGTTTCAATAAAGCAGAGATTACAG3960              CCAAAGATGGTAGAGATTTAACTATTGGCAACAGTAATGACGGTAACAGCGGTGCCGAAG4020              CCAAAACAGTAACTTTTAACAATGTTAAAGATTCAAAAATCTCTGCTGACGGTCACAATG4080              TGACACTAAATAGCAAAGTGAAAACATCTAGCAGCAATGGCGGACGTGAAAGCAATAGCG4140              ACAACGATACCGGCTTAACTATTACTGCAAAAAATGTAGAAGTAAACAAAGATATTACTT4200              CTCTCAAAACAGTAAATATCACCGCGTCGGAAAAGGTTACCACCACAGCAGGCTCGACCA4260              TTAACGCAACAAATGGCAAAGCAAGTATTACAACCAAAACAGGTGATATCAGCGGTACGA4320              TTTCCGGTAACACGGTAAGTGTTAGCGCGACTGGTGATTTAACCACTAAATCCGGCTCAA4380              AAATTGAAGCGAAATCGGGTGAGGCTAATGTAACAAGTGCAACAGGTACAATTGGCGGTA4440              CAATTTCCGGTAATACGGTAAATGTTACGGCAAACGCTGGCGATTTAACAGTTGGGAATG4500              GCGCAGAAATTAATGCGACAGAAGGAGCTGCAACCTTAACCGCAACAGGGAATACCTTGA4560              CTACTGAAGCCGGTTCTAGCATCACTTCAACTAAGGGTCAGGTAGACCTCTTGGCTCAGA4620              ATGGTAGCATCGCAGGAAGCATTAATGCTGCTAATGTGACATTAAATACTACAGGCACCT4680              TAACCACCGTGGCAGGCTCGGATATTAAAGCAACCAGCGGCACCTTGGTTATTAACGCAA4740              AAGATGCTAAGCTAAATGGTGATGCATCAGGTGATAGTACAGAAGTGAATGCAGTCAACG4800              ACTGGGGATTTGGTAGTGTGACTGCGGCAACCTCAAGCAGTGTGAATATCACTGGGGATT4860              TAAACACAGTAAATGGGTTAAATATCATTTCGAAAGATGGTAGAAACACTGTGCGCTTAA4920              GAGGCAAGGAAATTGAGGTGAAATATATCCAGCCAGGTGTAGCAAGTGTAGAAGAAGTAA4980              TTGAAGCGAAACGCGTCCTTGAAAAAGTAAAAGATTTATCTGATGAAGAAAGAGAAACAT5040              TAGCTAAACTTGGTGTAAGTGCTGTACGTTTTGTTGAGCCAAATAATACAATTACAGTCA5100              ATACACAAAATGAATTTACAACCAGACCGTCAAGTCAAGTGATAATTTCTGAAGGTAAGG5160              CGTGTTTCTCAAGTGGTAATGGCGCACGAGTATGTACCAATGTTGCTGACGATGGACAGC5220              CGTAGTCAGTAATTGACAAGGTAGATTTCATCCTGCAATGAAGTCATTTTATTTTCGTAT5280              TATTTACTGTGTGGGTTAAAGTTCAGTACGGGCTTTACCCATCTTGTAAAAAATTACGGA5340              GAATACAATAAAGTATTTTTAACAGGTTATTATTATGAAAAATATAAAAAGCAGATTAAA5400              ACTCAGTGCAATATCAGTATTGCTTGGCCTGGCTTCTTCATCATTGTATGCAGAAGAAGC5460              GTTTTTAGTAAAAGGCTTTCAGTTATCTGGTGCACTTGAAACTTTAAGTGAAGACGCCCA5520              ACTGTCTGTAGCAAAATCTTTATCTAAATACCAAGGCTCGCAAACTTTAACAAACCTAAA5580              AACAGCACAGCTTGAATTACAGGCTGTGCTAGATAAGATTGAGCCAAATAAATTTGATGT5640              GATATTGCCGCAACAAACCATTACGGATGGCAATATCATGTTTGAGCTAGTCTCGAAATC5700              AGCCGCAGAAAGCCAAGTTTTTTATAAGGCGAGCCAGGGTTATAGTGAAGAAAATATCGC5760              TCGTAGCCTGCCATCTTTGAAACAAGGAAAAGTGTATGAAGATGGTCGTCAGTGGTTCGA5820              TTTGCGTGAATTTAATATGGCAAAAGAAAACCCGCTTAAGGTTACCCGTGTACATTACGA5880              ACTAAACCCTAAAAACAAAACCTCTAATTTGATAATTGCGGGCTTCTCGCCTTTTGGTAA5940              AACGCGTAGCTTTATTTCTTATGATAATTTCGGCGCGAGAGAGTTTAACTACCAACGTGT6000              AAGCTTGGGTTTTGTTAATGCCAATTTAACTGGTCATGATGATGTGTTAATTATACCAGT6060              ATGAGTTATGCTGATTCTAATGATATCGACGGCTTACCAAGTGCGATTAATCGTAAATTA6120              TCAAAAGGTCAATCTATCTCTGCGAATCTGAAATGGAGTTATTATCTCCCAACATTTAAC6180              CTTGGCATGGAAGACCAATTTAAAATTAATTTAGGCTACAACTACCGCCATATTAATCAA6240              ACCTCCGCGTTAAATCGCTTGGGTGAAACGAAGAAAAAATTTGCAGTATCAGGCGTAAGT6300              GCAGGCATTGATGGACATATCCAATTTACCCCTAAAACAATCTTTAATATTGATTTAACT6360              CATCATTATTACGCGAGTAAATTACCAGGCTCTTTTGGAATGGAGCGCATTGGCGAAACA6420              TTTAATCGCAGCTATCACATTAGCACAGCCAGTTTAGGGTTGAGTCAAGAGTTTGCTCAA6480              GGTTGGCATTTTAGCAGTCAATTATCAGGTCAATTTACTCTACAAGATATTAGCAGTATA6540              GATTTATTCTCTGTAACAGGTACTTATGGCGTCAGAGGCTTTAAATACGGCGGTGCAAGT6600              GGTGAGCGCGGTCTTGTATGGCGTAATGAATTAAGTATGCCAAAATACACCCGCTTCCAA6660              ATCAGCCCTTATGCGTTTTATGATGCAGGTCAGTTCCGTTATAATAGCGAAAATGCTAAA6720              ACTTACGGCGAAGATATGCACACGGTATCCTCTGCGGGTTTAGGCATTAAAACCTCTCCT6780              ACACAAAACTTAAGCCTAGATGCTTTTGTTGCTCGTCGCTTTGCAAATGCCAATAGTGAC6840              AATTTGAATGGCAACAAAAAACGCACAAGCTCACCTACAACCTTCTGGGGGAGATTAACA6900              TTCAGTTTCTAACCCTGAAATTTAATCAACTGGTAAGCGTTCCGCCTACCAGTTTATAAC6960              TATATGCTTTACCCGCCAATTTACAGTCTATAGGCAACCCTGTTTTTACCCTTATATATC7020              AAATAAACAAGCTAAGCTGAGCTAAGCAAACCAAGCAAACTCAAGCAAGCCAAGTAATAC7080              TAAAAAAACAATTTATATGATAAACTAAAGTATACTCCATGCCATGGCGATACAAGGGAT7140              TTAATAATATGACAAAAGAAAATTTGCAAAACGCTCCTCAAGATGCGACCGCTTTACTTG7200              CGGAATTAAGCAACAATCAAACTCCCCTGCGAATATTTAAACAACCACGCAAGCCCAGCC7260              TATTACGCTTGGAACAACATATCGCAAAAAAAGATTATGAGTTTGCTTGTCGTGAATTAA7320              TGGTGATTCTGGAAAAAATGGACGCTAATTTTGGAGGCGTTCACGATATTGAATTTGACG7380              CACCCGCTCAGCTGGCATATCTACCCGAAAAATTACTAATTTATTTTGCCACTCGTCTCG7440              CTAATGCAATTACAACACTCTTTTCCGACCCCGAATTGGCAATTTCTGAAGAAGGGGCGT7500              TAAAGATGATTAGCCTGCAACGCTGGTTGACGCTGATTTTTGCCTCTTCCCCCTACGTTA7560              ACGCAGACCATATTCTCAATAAATATAATATCAACCCAGATTCCGAAGGTGGCTTTCATT7620              TAGCAACAGACAACTCTTCTATTGCTAAATTCTGTATTTTTTACTTACCCGAATCCAATG7680              TCAATATGAGTTTAGATGCGTTATGGGCAGGGAATCAACAACTTTGTGCTTCATTGTGTT7740              TTGCGTTGCAGTCTTCACGTTTTATTGGTACCGCATCTGCGTTTCATAAAAGAGCGGTGG7800              TTTTACAGTGGTTTCCTAAAAAACTCGCCGAAATTGCTAATTTAGATGAATTGCCTGCAA7860              ATATCCTTCATGATGTATATATGCACTGCAGTTATGATTTAGCAAAAAACAAGCACGATG7920              TTAAGCGTCCATTAAACGAACTTGTCCGCAAGCATATCCTCACGCAAGGATGGCAAGACC7980              GCTACCTTTACACCTTAGGTAAAAAGGACGGCAAACCTGTGATGATGGTACTGCTTGAAC8040              ATTTTAATTCGGGACATTCGATTTATCGTACACATTCAACTTCAATGATTGCTGCTCGAG8100              AAAAATTCTATTTAGTCGGCTTAGGCCATGAGGGCGTTGATAAAATAGGTCGAGAAGTGT8160              TTGACGAGTTCTTTGAAATCAGTAGCAATAATATAATGGAGAGACTGTTTTTTATCCGTA8220              AACAGTGCGAAACTTTCCAACCCGCAGTGTTCTATATGCCAAGCATTGGCATGGATATTA8280              CCACGATTTTTGTGAGCAACACTCGGCTTGCCCCTATTCAAGCTGTAGCCCTGGGTCATC8340              CTGCCACTACGCATTCTGAATTTATTGATTATGTCATCGTAGAAGATGATTATGTGGGCA8400              GTGAAGATTGTTTCAGCGAAACCCTTTTACGCTTACCCAAAGATGCCCTACCTTATGTAC8460              CTTCTGCACTCGCCCCACAAAAAGTGGATTATGTACTCAGGGAAAACCCTGAAGTAGTCA8520              ATATCGGTATTGCCGCTACCACAATGAAATTAAACCCTGAATTTTTGCTAACATTGCAAG8580              AAATCAGAGATAAAGCTAAAGTCAAAATACATTTTCATTTCGCACTTGGACAATCAACAG8640              GCTTGACACACCCTTATGTCAAATGGTTTATCGAAAGCTATTTAGGTGACGATGCCACTG8700              CACATCCCCACGCACCTTATCACGATTATCTGGCAATATTGCGTGATTGCGATATGCTAC8760              TAAATCCGTTTCCTTTCGGTAATACTAACGGCATAATTGATATGGTTACATTAGGTTTAG8820              TTGGTGTATGCAAAACGGGGGATGAAGTACATGAACATATTGATGAAGGTCTGTTTAAAC8880              GCTTAGGACTACCAGAATGGCTGATAGCCGACACACGAGAAACATATATTGAATGTGCTT8940              TGCGTCTAGCAGAAAACCATCAAGAACGCCTTGAACTCCGTCGTTACATCATAGAAAACA9000              ACGGCTTACAAAAGCTTTTTACAGGCGACCCTCGTCCATTGGGCAAAATACTGCTTAAGA9060              AAACAAATGAATGGAAGCGGAAGCACTTGAGTAAAAAATAACGGTTTTTTAAAGTAAAAG9120              TGCGGTTAATTTTCAAAGCGTTTTAAAAACCTCTCAAAAATCAACCGCACTTTTATCTTT9180              ATAACGATCCCGCACGCTGACAGTTTATCAGCCTCCCGCCATAAAACTCCGCCTTTCATG9240              GCGGAGATTTTAGCCAAAACTGGCAGAAATTAAAGGCTAAAATCACCAAATTGCACCACA9300              AAATCACCAATACCCACAAAAAA9323                                                   (2) INFORMATION FOR SEQ ID NO:7:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 4287 base pairs                                                   (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (xi) SEQUENCE DESCRIPTION: SEQ ID NO:7:                                       GATCAATCTGGGCGATATTTTTGCCAAAGGTGGTAACATTAATGTCCGCGCTGCCACTAT60                TCGCAATAAAGGTAAACTTTCTGCCGACTCTGTAAGCAAAGATAAAAGTGGTAACATTGT120               TCTCTCTGCCAAAGAAGGTGAAGCGGAAATTGGCGGTGTAATTTCCGCTCAAAATCAGCA180               AGCCAAAGGTGGTAAGTTGATGATTACAGGCGATAAAGTTACATTGAAAACGGGTGCACT240               TATCGACCTTTCGGGTAAAGAAGGGGGAGAAACTTATCTTGGCGGTGACGAGCGTGGCGA300               AGGTAAAAACGGCATTCAATTAGCAAAGAAAACCACTTTAGAAAAAGGCTCAACAATTAA360               TGTGTCAGGTAAAGAAAAAGCTGGGCGCGCTATTGTATGGGGCGATATTGCGTTAATTGA420               CGGCAATATTAATGCCCAAGGTAAAGATATCGCTAAAACTGGTGGTTTTGTGGAGACGTC480               GGGGCATTACTTATCCATTGATGATAACGCAATTGTTAAAACAAAAGAATGGCTACTAGA540               CCCAGAGAATGTGACTATTGAAGCTCCTTCCGCTTCTCGCGTCGAGCTGGGTGCCGATAG600               GAATTCCCACTCGGCAGAGGTGATAAAAGTGACCCTAAAAAAAAATAACACCTCCTTGAC660               AACACTAACCAATACAACCATTTCAAATCTTCTGAAAAGTGCCCACGTGGTGAACATAAC720               GGCAAGGAGAAAACTTACCGTTAATAGCTCTATCAGTATAGAAAGAGGCTCCCACTTAAT780               TCTCCACAGTGAAGGTCAGGGCGGTCAAGGTGTTCAGATTGATAAAGATATTACTTCTGA840               AGGCGGAAATTTAACCATTTATTCTGGCGGATGGGTTGATGTTCATAAAAATATTACGCT900               TGGTAGCGGCTTTTTAAACATCACAACTAAAGAAGGAGATATCGCCTTCGAAGACAAGTC960               TGGACGGAACAACCTAACCATTACAGCCCAAGGGACCATCACCTCAGGTAATAGTAACGG1020              CTTTAGATTTAACAACGTCTCTCTAAACAGCCTTGGCGGAAAGCTGAGCTTTACTGACAG1080              CAGAGAGGACAGAGGTAGAAGAACTAAGGGTAATATCTCAAACAAATTTGACGGAACGTT1140              AAACATTTCCGGAACTGTAGATATCTCAATGAAAGCACCCAAAGTCAGCTGGTTTTACAG1200              AGACAAAGGACGCACCTACTGGAACGTAACCACTTTAAATGTTACCTCGGGTAGTAAATT1260              TAACCTCTCCATTGACAGCACAGGAAGTGGCTCAACAGGTCCAAGCATACGCAATGCAGA1320              ATTAAATGGCATAACATTTAATAAAGCCACTTTTAATATCGCACAAGGCTCAACAGCTAA1380              CTTTAGCATCAAGGCATCAATAATGCCCTTTAAGAGTAACGCTAACTACGCATTATTTAA1440              TGAAGATATTTCAGTCTCAGGGGGGGGTAGCGTTAATTTCAAACTTAACGCCTCATCTAG1500              CAACATACAAACCCCTGGCGTAATTATAAAATCTCAAAACTTTAATGTCTCAGGAGGGTC1560              AACTTTAAATCTCAAGGCTGAAGGTTCAACAGAAACCGCTTTTTCAATAGAAAATGATTT1620              AAACTTAAACGCCACCGGTGGCAATATAACAATCAGACAAGTCGAGGGTACCGATTCACG1680              CGTCAACAAAGGTGTCGCAGCCAAAAAAAACATAACTTTTAAAGGGGGTAATATCACCTT1740              CGGCTCTCAAAAAGCCACAACAGAAATCAAAGGCAATGTTACCATCAATAAAAACACTAA1800              CGCTACTCTTCGTGGTGCGAATTTTGCCGAAAACAAATCGCCTTTAAATATAGCAGGAAA1860              TGTTATTAATAATGGCAACCTTACCACTGCCGGCTCCATTATCAATATAGCCGGAAATCT1920              TACTGTTTCAAAAGGCGCTAACCTTCAAGCTATAACAAATTACACTTTTAATGTAGCCGG1980              CTCATTTGACAACAATGGCGCTTCAAACATTTCCATTGCCAGAGGAGGGGCTAAATTTAA2040              AGATATCAATAACACCAGTAGCTTAAATATTACCACCAACTCTGATACCACTTACCGCAC2100              CATTATAAAAGGCAATATATCCAACAAATCAGGTGATTTGAATATTATTGATAAAAAAAG2160              CGACGCTGAAATCCAAATTGGCGGCAATATCTCACAAAAAGAAGGCAATCTCACAATTTC2220              TTCTGATAAAGTAAATATTACCAATCAGATAACAATCAAAGCAGGCGTTGAAGGGGGGCG2280              TTCTGATTCAAGTGAGGCAGAAAATGCTAACCTAACTATTCAAACCAAAGAGTTAAAATT2340              GGCAGGAGACCTAAATATTTCAGGCTTTAATAAAGCAGAAATTACAGCTAAAAATGGCAG2400              TGATTTAACTATTGGCAATGCTAGCGGTGGTAATGCTGATGCTAAAAAAGTGACTTTTGA2460              CAAGGTTAAAGATTCAAAAATCTCGACTGACGGTCACAATGTAACACTAAATAGCGAAGT2520              GAAAACGTCTAATGGTAGTAGCAATGCTGGTAATGATAACAGCACCGGTTTAACCATTTC2580              CGCAAAAGATGTAACGGTAAACAATAACGTTACCTCCCACAAGACAATAAATATCTCTGC2640              CGCAGCAGGAAATGTAACAACCAAAGAAGGCACAACTATCAATGCAACCACAGGCAGCGT2700              GGAAGTAACTGCTCAAAATGGTACAATTAAAGGCAACATTACCTCGCAAAATGTAACAGT2760              GACAGCAACAGAAAATCTTGTTACCACAGAGAATGCTGTCATTAATGCAACCAGCGGCAC2820              AGTAAACATTAGTACAAAAACAGGGGATATTAAAGGTGGAATTGAATCAACTTCCGGTAA2880              TGTAAATATTACAGCGAGCGGCAATACACTTAAGGTAAGTAATATCACTGGTCAAGATGT2940              AACAGTAACAGCGGATGCAGGAGCCTTGACAACTACAGCAGGCTCAACCATTAGTGCGAC3000              AACAGGCAATGCAAATATTACAACCAAAACAGGTGATATCAACGGTAAAGTTGAATCCAG3060              CTCCGGCTCTGTAACACTTGTTGCAACTGGAGCAACTCTTGCTGTAGGTAATATTTCAGG3120              TAACACTGTTACTATTACTGCGGATAGCGGTAAATTAACCTCCACAGTAGGTTCTACAAT3180              TAATGGGACTAATAGTGTAACCACCTCAAGCCAATCAGGCGATATTGAAGGTACAATTTC3240              TGGTAATACAGTAAATGTTACAGCAAGCACTGGTGATTTAACTATTGGAAATAGTGCAAA3300              AGTTGAAGCGAAAAATGGAGCTGCAACCTTAACTGCTGAATCAGGCAAATTAACCACCCA3360              AACAGGCTCTAGCATTACCTCAAGCAATGGTCAGACAACTCTTACAGCCAAGGATAGCAG3420              TATCGCAGGAAACATTAATGCTGCTAATGTGACGTTAAATACCACAGGCACTTTAACTAC3480              TACAGGGGATTCAAAGATTAACGCAACCAGTGGTACCTTAACAATCAATGCAAAAGATGC3540              CAAATTAGATGGTGCTGCATCAGGTGACCGCACAGTAGTAAATGCAACTAACGCAAGTGG3600              CTCTGGTAACGTGACTGCGAAAACCTCAAGCAGCGTGAATATCACCGGGGATTTAAACAC3660              AATAAATGGGTTAAATATCATTTCGGAAAATGGTAGAAACACTGTGCGCTTAAGAGGCAA3720              GGAAATTGATGTGAAATATATCCAACCAGGTGTAGCAAGCGTAGAAGAGGTAATTGAAGC3780              GAAACGCGTCCTTGAGAAGGTAAAAGATTTATCTGATGAAGAAAGAGAAACACTAGCCAA3840              ACTTGGTGTAAGTGCTGTACGTTTCGTTGAGCCAAATAATGCCATTACGGTTAATACACA3900              AAACGAGTTTACAACCAAACCATCAAGTCAAGTGACAATTTCTGAAGGTAAGGCGTGTTT3960              CTCAAGTGGTAATGGCGCACGAGTATGTACCAATGTTGCTGACGATGGACAGCAGTAGTC4020              AGTAATTGACAAGGTAGATTTCATCCTGCAATGAAGTCATTTTATTTTCGTATTATTTAC4080              TGTGTGGGTTAAAGTTCAGTACGGGCTTTACCCACCTTGTAAAAAATTACGAAAAATACA4140              ATAAAGTATTTTTAACAGGTTATTATTATGAAAAACATAAAAAGCAGATTAAAACTCAGT4200              GCAATATCAATATTGCTTGGCTTGGCTTCTTCATCGACGTATGCAGAAGAAGCGTTTTTA4260              GTAAAAGGCTTTCAGTTATCTGGCGCG4287                                               (2) INFORMATION FOR SEQ ID NO:8:                                              (i) SEQUENCE CHARACTERISTICS:                                                 (A) LENGTH: 4702 base pairs                                                   (B) TYPE: nucleic acid                                                        (C) STRANDEDNESS: single                                                      (D) TOPOLOGY: linear                                                          (ii) MOLECULE TYPE: DNA (genomic)                                             (xi) SEQUENCE DESCRIPTION: SEQ ID NO:8:                                       GGGAATGAGCGTCGTACACGGTACAGCAACCATGCAAGTAGACGGCAATAAAACCACTAT60                CCGTAATAGCATCAATGCTATCATCAATTGGAAACAATTTAACATTGACCAAAATGAAAT120               GGAGCAGTTTTTACAAGAAAGCAGCAACTCTGCCGTTTTCAACCGTGTTACATCTGACCA180               AATCTCCCAATTAAAAGGGATTTTAGATTCTAACGGACAAGTCTTTTTAATCAACCCAAA240               TGGTATCACAATAGGTAAAGACGCAATTATTAACACTAATGGCTTTACTGCTTCTACGCT300               AGACATTTCTAACGAAAACATCAAGGCGCGTAATTTCACCCTTGAGCAAACCAAGGATAA360               AGCACTCGCTGAAATCGTGAATCACGGTTTAATTACCGTTGGTAAAGACGGTAGCGTAAA420               CCTTATTGGTGGCAAAGTGAAAAACGAGGGCGTGATTAGCGTAAATGGCGGTAGTATTTC480               TTTACTTGCAGGGCAAAAAATCACCATCAGCGATATAATAAATCCAACCATCACTTACAG540               CATTGCTGCACCTGAAAACGAAGCGATCAATCTGGGCGATATTTTTGCCAAAGGTGGTAA600               CATTAATGTCCGCGCTGCCACTATTCGCAATAAAGGTAAACTTTCTGCCGACTCTGTAAG660               CAAAGATAAAAGTGGTAACATTGTTCTCTCTGCCAAAGAAGGTGAAGCGGAAATTGGCGG720               TGTAATTTCCGCTCAAAATCAGCAAGCCAAAGGTGGTAAGTTGATGATTACAGGTGATAA780               AGTCACATTAAAAACAGGTGCAGTTATCGACCTTTCAGGTAAAGAAGGGGGAGAGACTTA840               TCTTGGCGGTGATGAGCGTGGCGAAGGTAAAAATGGTATTCAATTAGCGAAGAAAACCTC900               TTTAGAAAAAGGCTCGACAATTAATGTATCAGGCAAAGAAAAAGGCGGGCGCGCTATTGT960               ATGGGGCGATATTGCATTAATTAATGGTAACATTAATGCTCAAGGTAGCGATATTGCTAA1020              AACTGGCGGCTTTGTGGAAACATCAGGACATGACTTATCCATTGGTGATGATGTGATTGT1080              TGACGCTAAAGAGTGGTTATTAGACCCAGATGATGTGTCCATTGAAACTCTTACATCTGG1140              ACGCAATAATACCGGCGAAAACCAAGGATATACAACAGGAGATGGGACTAAAGAGTCACC1200              TAAAGGTAATAGTATTTCTAAACCTACATTAACAAACTCAACTCTTGAGCAAATCCTAAG1260              AAGAGGTTCTTATGTTAATATCACTGCTAATAATAGAATTTATGTTAATAGCTCCATCAA1320              CTTATCTAATGGCAGTTTAACACTTCACACTAAACGAGATGGAGTTAAAATTAACGGTGA1380              TATTACCTCAAACGAAAATGGTAATTTAACCATTAAAGCAGGCTCTTGGGTTGATGTTCA1440              TAAAAACATCACGCTTGGTACGGGTTTTTTCAATATTGTCGCTGGGGATTCTGTAGCTTT1500              TGAGAGAGAGGGCGATAAAGCACGTAACGCAACAGATGCTCAAATTACCGCACAAGGGAC1560              GATAACCGTCAATAAAGATGATAAACAATTTAGATTCAATAATGTATCTATTAACGGGAC1620              GGGCAAGGGTTTAAAGTTTATTGCAAATCAAAATAATTTCACTCATAAATTTGATGGCGA1680              AATTAACATATCTGGAATAGTAACAATTAACCAAACCACGAAAAAAGATGTTAAATACTG1740              GAATGCATCAAAAGACTCTTACTGGAATGTTTCTTCTCTTACTTTGAATACGGTGCAAAA1800              ATTTACCTTTATAAAATTCGTTGATAGCGGCTCAAATTCCCAAGATTTGAGGTCATCACG1860              TAGAAGTTTTGCAGGCGTACATTTTAACGGCATCGGAGGCAAAACAAACTTCAACATCGG1920              AGCTAACGCAAAAGCCTTATTTAAATTAAAACCAAACGCCGCTACAGACCCAAAAAAAGA1980              ATTACCTATTACTTTTAACGCCAACATTACAGCTACCGGTAACAGTGATAGCTCTGTGAT2040              GTTTGACATACACGCCAATCTTACCTCTAGAGCTGCCGGCATAAACATGGATTCAATTAA2100              CATTACCGGCGGGCTTGACTTTTCCATAACATCCCATAATCGCAATAGTAATGCTTTTGA2160              AATCAAAAAAGACTTAACTATAAATGCAACTGGCTCGAATTTTAGTCTTAAGCAAACGAA2220              AGATTCTTTTTATAATGAATACAGCAAACACGCCATTAACTCAAGTCATAATCTAACCAT2280              TCTTGGCGGCAATGTCACTCTAGGTGGGGAAAATTCAAGCAGTAGCATTACGGGCAATAT2340              CAATATCACCAATAAAGCAAATGTTACATTACAAGCTGACACCAGCAACAGCAACACAGG2400              CTTGAAGAAAAGAACTCTAACTCTTGGCAATATATCTGTTGAGGGGAATTTAAGCCTAAC2460              TGGTGCAAATGCAAACATTGTCGGCAATCTTTCTATTGCAGAAGATTCCACATTTAAAGG2520              AGAAGCCAGTGACAACCTAAACATCACCGGCACCTTTACCAACAACGGTACCGCCAACAT2580              TAATATAAAACAAGGAGTGGTAAAACTCCAAGGCGATATTATCAATAAAGGTGGTTTAAA2640              TATCACTACTAACGCCTCAGGCACTCAAAAAACCATTATTAACGGAAATATAACTAACGA2700              AAAAGGCGACTTAAACATCAAGAATATTAAAGCCGACGCCGAAATCCAAATTGGCGGCAA2760              TATCTCACAAAAAGAAGGCAATCTCACAATTTCTTCTGATAAAGTAAATATTACCAATCA2820              GATAACAATCAAAGCAGGCGTTGAAGGGGGGCGTTCTGATTCAAGTGAGGCAGAAAATGC2880              TAACCTAACTATTCAAACCAAAGAGTTAAAATTGGCAGGAGACCTAAATATTTCAGGCTT2940              TAATAAAGCAGAAATTACAGCTAAAAATGGCAGTGATTTAACTATTGGCAATGCTAGCGG3000              TGGTAATGCTGATGCTAAAAAAGTGACTTTTGACAAGGTTAAAGATTCAAAAATCTCGAC3060              TGACGGTCACAATGTAACACTAAATAGCGAAGTGAAAACGTCTAATGGTAGTAGCAATGC3120              TGGTAATGATAACAGCACCGGTTTAACCATTTCCGCAAAAGATGTAACGGTAAACAATAA3180              CGTTACCTCCCACAAGACAATAAATATCTCTGCCGCAGCAGGAAATGTAACAACCAAAGA3240              AGGCACAACTATCAATGCAACCACAGGCAGCGTGGAAGTAACTGCTCAAAATGGTACAAT3300              TAAAGGCAACATTACCTCGCAAAATGTAACAGTGACAGCAACAGAAAATCTTGTTACCAC3360              AGAGAATGCTGTCATTAATGCAACCAGCGGCACAGTAAACATTAGTACAAAAACAGGGGA3420              TATTAAAGGTGGAATTGAATCAACTTCCGGTAATGTAAATATTACAGCGAGCGGCAATAC3480              ACTTAAGGTAAGTAATATCACTGGTCAAGATGTAACAGTAACAGCGGATGCAGGAGCCTT3540              GACAACTACAGCAGGCTCAACCATTAGTGCGACAACAGGCAATGCAAATATTACAACCAA3600              AACAGGTGATATCAACGGTAAAGTTGAATCCAGCTCCGGCTCTGTAACACTTGTTGCAAC3660              TGGAGCAACTCTTGCTGTAGGTAATATTTCAGGTAACACTGTTACTATTACTGCGGATAG3720              CGGTAAATTAACCTCCACAGTAGGTTCTACAATTAATGGGACTAATAGTGTAACCACCTC3780              AAGCCAATCAGGCGATATTGAAGGTACAATTTCTGGTAATACAGTAAATGTTACAGCAAG3840              CACTGGTGATTTAACTATTGGAAATAGTGCAAAAGTTGAAGCGAAAAATGGAGCTGCAAC3900              CTTAACTGCTGAATCAGGCAAATTAACCACCCAAACAGGCTCTAGCATTACCTCAAGCAA3960              TGGTCAGACAACTCTTACAGCCAAGGATAGCAGTATCGCAGGAAACATTAATGCTGCTAA4020              TGTGACGTTAAATACCACAGGCACTTTAACTACTACAGGGGATTCAAAGATTAACGCAAC4080              CAGTGGTACCTTAACAATCAATGCAAAAGATGCCAAATTAGATGGTGCTGCATCAGGTGA4140              CCGCACAGTAGTAAATGCAACTAACGCAAGTGGCTCTGGTAACGTGACTGCGAAAACCTC4200              AAGCAGCGTGAATATCACCGGGGATTTAAACACAATAAATGGGTTAAATATCATTTCGGA4260              AAATGGTAGAAACACTGTGCGCTTAAGAGGCAAGGAAATTGATGTGAAATATATCCAACC4320              AGGTGTAGCAAGCGTAGAAGAGGTAATTGAAGCGAAACGCGTCCTTGAGAAGGTAAAAGA4380              TTTATCTGATGAAGAAAGAGAAACACTAGCCAAACTTGGTGTAAGTGCTGTACGTTTCGT4440              TGAGCCAAATAATGCCATTACGGTTAATACACAAAACGAGTTTACAACCAAACCATCAAG4500              TCAAGTGACAATTTCTGAAGGTAAGGCGTGTTTCTCAAGTGGTAATGGCGCACGAGTATG4560              TACCAATGTTGCTGACGATGGACAGCAGTAGTCAGTAATTGACAAGGTAGATTTCATCCT4620              GCAATGAAGTCATTTTATTTTCGTATTATTTACTGTGTGGGTTAAAGTTCAGTACGGGCT4680              TTACCCACCTTGTAAAAAATTA4702                                                    __________________________________________________________________________

What I claim is:
 1. An isolated and purified gene which encodes a highmolecular weight protein having the amino acid sequence of SEQ ID:
 2. 2.The gene of claim 1 having the DNA sequence of SEQ ID:
 1. 3. Theisolated and purified gene cluster of a non-typeable Haemophilus straincomprising the sequence of SEQ ID: 5.